Sunday, 23 April 2017

New in SAP HANA, express edition: Streaming Analytics

SAP HANA smart data streaming (SDS) is HANA’s high speed real-time streaming analytics engine.  It lets you easily build and deploy streaming data models (a.k.a. projects) that process and analyze incoming messages as fast as they arrive, allowing you to react in real-time to what’s going on. Use it to generate alerts or initiate an immediate response. With the Internet of Things (IoT), common uses include analyzing incoming sensor data from smart devices – particularly when there is a need to react in real-time to a new opportunity – or in anticipation of a problem.

Use Cases


Streaming Analytics can be applied in a wide variety of use cases – wherever there is fast moving data and value from understanding and acting on it as soon as things happen. Common use cases include:
  • Predictive maintenance, predictive quality: detect indications of impending failure in take to take preventative action
  • Marketing: customized offers in real-time, reacting to customer activity
  • Fraud/threat detection/prevention: detect and flag patterns of events that indicate possible fraud or an active threat
  • Location monitoring: detect when equipment/assets are not where they are supposed to be

Streaming data models


Streaming data models define the operations to apply to incoming messages and are contained in streaming projects that run on the SDS server. These models are defined in a SQL-like language that we call CCL (continuous computation language) – it’s really just SQL with some extensions for processing live streams. The big difference though, is that this SQL doesn’t execute against the HANA database, but gets compiled into a set of “continuous queries” that run in the SDS dataflow engine.

Here’s a simple example of a streaming data model that smooths out some sensor data by computing a five minute moving average:

CREATE INPUT STREAM DeviceIn
SCHEMA (Id string, Value integer);

CREATE OUTPUT WINDOW MvAvg
PRIMARY KEY DEDUCED
AS SELECT
   DeviceIn.Id AS Id ,
   avg(DeviceIn.Value) AS AvgValue
FROM DeviceIn KEEP 5 MINUTES
GROUP BY DeviceIn.Id ;

You can see that it looks pretty much like standard SQL, except that instead of creating Tables we are creating streams and windows. With windows, we can define a retention policy – in this example KEEP 5 MINUTES.  And with a moving average we’re just getting started. Filtering events is as simple as a WHERE clause. You can join events streams to HANA tables to combine live events with reference data or historical data. You can also join events to each other. You can match/correlate events, watch for patterns or trends. Anyway – you get the idea.

Capturing streaming data in the HANA database


Any of the data can be captured in the HANA database – and by capturing derived data, rather than raw data – you can reduce the amount of data being captured. You can sample the data or only store data when it changes.

If I wanted to store my moving average from the example above in a HANA table called MV_AVG, I would simply attach a HANA output adapter to the window above by adding this statement to my project:

ATTACH OUTPUT ADAPTER HANA_Output1
TYPE hana_out TO MvAvg
PROPERTIES
   service = 'hdb1',
   sourceSchema = 'MY_SCHEMA',
   table = 'MV_AVG';

Connecting to data sources


SDS includes an integrated web service that can expose a REST interface for all input streams. High frequency publishers can use WebSockets for greater efficiency.

SDS also includes a range of pre-built adapters including Kafka, JMS, file loaders and others. An adapter toolkit (Java) makes it easy to build custom adapters.

Using Real-time output


In addition to the ability to capture output from streaming projects in HANA database tables, real-time output can also be streamed to applications, dashboards, published onto a Kafka for JMS message queue, sent as email or stored in Hadoop.

Machine Learning for Predictive Analytics on Streams


SDS includes two machine learning algorithms – a decision tree algorithm and a clustering algorithm – as well as the ability to import decision tree models built using the PAL algorithms in HANA. These are particularly useful for predictive use cases, enabling you to take action based on leading indicators or detecting unusual situations.

High speed, Scalable


The SDS dataflow engine is designed to be highly scalable with support for both scale-up and scale-out, proving the ability to process millions of messages per second (with sufficient CPU capacity) and delivering results within milliseconds of message arrival.

Design time tools


Design time tools for building and testing streaming projects are available as a plugin for Eclipse and are also available in SAP Web IDE for SAP HANA. Both include a syntax aware CCL editor plus testing tools including a stream viewer, record/playback and manual input tools. The Eclipse plugin also includes a visual drag-and-drop style model builder.

Try it Out


If you’re interested in taking it for a test spin, the easiest way to get started is to follow this hands-on tutorial that takes you through the steps of building a simple IoT project to monitor sensor data from freezer units.

No comments:

Post a Comment