Showing posts with label spark. Show all posts
Showing posts with label spark. Show all posts

Monday, 9 July 2018

Hana Hadoop Integration with Federated Access

Data Lake analytics have become real. But the challenge is to access the data quickly and provide meaningful insights. There are several techniques to access the data faster. In this blog we see how we can integrate Hana with Hadoop to get insights for larger data sets quickly.

If you have both Hana & Hadoop in your eco-system. SAP has provided an option to integrate HANA and Hadoop using Hana Spark controller. Where power of In-Memory processing can be used for real time insights and we can in parallel use Hadoop ability to process huge data sets.

Monday, 19 September 2016

Optimising HANA Query push-down from Apache Spark

As you start using it with larger Tables or Views in HANA, you may notice that the query that is pushed down into HANA is not optimised efficiently. SUM & GROUP BY Clauses are NOT pushed down into HANA,  this may cause a large granular result set to be move across the network, to only be Aggregated in Spark.  That is certainly a waste of HANA's powerful query engine.
In this blog I will demonstrate the problem and show several ways to help get around it, using Apache Spark.

Using the same dataset from the earlier blog  we can see how more complex Spark SQL ( executing against a test table in HANA - "RDATA") is actually pushed down into HANA.

Monday, 12 September 2016

Calling HANA Views from Apache Spark

Open Source Apache Spark  is fast becoming the de facto standard for Big Data processing and analytics. It’s an ‘in-memory’ data processing engine, utilising the distributed computing power of 10’s or even 1000’s of logical linked host machines (cluster). It’s able to crunch through vast quantities of both structured and unstructured data. You can easily scale out your cluster as your data appetite grows.

In addition to this it can also be used as a data federation layer spanning both traditional databases as well as other popular big data platforms, such as Hadoop HDFS, Hadoop Hbase, Cassandra, Amazon Redshift and S3, to name a few.

Thursday, 28 April 2016

Vora 1.2 installation Cheat sheet: Concepts, Requirements and Installation

SAP HANA Vora provides an in-memory processing engine which can scale up to thousands of nodes, both on premise and in cloud. Vora fits into the Hadoop Ecosystem and extends the Spark execution framework.

Concepts and Requirements:

Sap HANA VORA 1.2 consists of the two following main components:
  • SAP HANA Vora Engine: 
SAP HANA Vora instances hold data in memory and boost the performance.
  • SAP HANA Vora Spark Extension Library:
    • Provides access to SAP HANA Vora through Spark.
    • Makes available additional functionality, such as a hierarchy implementation.

Wednesday, 13 April 2016

Introducing SAP HANA Vora1.2

SAP HANA Vora 1.2 was released recently and with this new version we have added several new features to the product. Some of the key ones I want to highlight in this blog are

  • Support for MapR Hadoop distro
  • Introducing new “OLAP” modeler to build hierarchical data models on Vora data
  • Discovery service using open source Consul – to register Vora services automatically
  • New Catalog to replace Zookeper as metadatstore
  • Native persistency for metadata catalog using Distributed shared log
  • Thriftserver for client access thru jdbc-spark connectivity

The new installer for Vora in ver1.2 extends the simplified installer to be able to use Hadoop Management tools like MapR Control System to deploy Vora on all the Hadoop/Spark nodes. This is an addition to what was provided in ver1.0 for Cloudera Manager and Ambari admin tools.