Friday, 22 September 2017

Reducing workload on HANA hosts by routing end user query

SAP HANA is known for its versatile capabilities as a database as well as a platform by providing solution for varied end user requirements.

HANA appliance has evolved from storing data in column/row store fashion towards serving as a data container for unstructured data too and helping end user to transform data set like xml and JSON to relational data structure and vice versa with at most ease.

Being a hub of solution for all the end user needs, SAP HANA also helps end user to avail the flexibility in query handling and fine tuning the performance of his/her operations performed on the database.

SAP HANA turning out to be one stop shop for all the big data solutions, it becomes more important to handle 3Vs of big data wing (Volume, Variety and Velocity) in a best possible way. In the endeavor of achieving it, there are new capabilities that are introduced in HANA platform to reduce workload on hosts by balancing load across the worker hosts.

In this blog we will understand, how the query routing operation can be performed by end user on a distributed SAP HANA system.

Before which, we will first conceptualize table replication and table partitioning in HANA which are used to balance load and reduce network traffic across tables in distributed environment.

Table partitioning is one of the key capabilities in HANA that exists to split column-store tables horizontally into disjunctive sub-tables or partitions, so that table with large data is broken into smaller manageable parts and could be placed across the HANA nodes to balance load in a best possible way.

Table Replication is another savior in case of a scale out system to reduce network traffic and fasten the response of a join like query between tables/partitions of tables located across HANA hosts.

With the above two existing features, release of HANA 2.0 SPS02 brought in a flexibility to navigate or route the query on a replicated table to user specified host so as to fasten the response time of the query based on the underlying operation.

Let us understand it deep with an example:

SAP HANA under use in this case is a distributed system with Master and Slave Worker Host.

Execute the below query to get the masterIndexServer details in your respective HANA system :

select CONCAT(CONCAT(HOST,':'),PORT) as masterIndexServer from
"PUBLIC"."M_SERVICES" where COORDINATOR_TYPE = 'MASTER'
AND SERVICE_NAME = 'indexserver';

Use the below SQL to create a partitioned column store table and locate it inside the master HANA node using the details retrieved from step 1

CREATE COLUMN TABLE CUSTOMER_DETAILS (ID INT, NAME NVARCHAR(5000))
PARTITION BY RANGE(ID) (PARTITION 10 <= VALUES < 20, PARTITION VALUE = 25,
 PARTITION OTHERS) AT LOCATION <masterIndexserver>;

Insert data in to the above created table:

INSERT INTO CUSTOMER_DETAILS VALUES(1,'BMW');
INSERT INTO CUSTOMER_DETAILS VALUES(2,'SHELL');
INSERT INTO CUSTOMER_DETAILS VALUES(3,'COLGATE');
INSERT INTO CUSTOMER_DETAILS VALUES(4,'AUDI');
INSERT INTO CUSTOMER_DETAILS VALUES(5,'BENZ');

Check the table/partition location details by using below query :

SELECT * FROM "PUBLIC"."M_TABLE_PERSISTENCE_LOCATIONS"
WHERE TABLE_NAME = 'CUSTOMER_DETAILS'

we must see the below response

SAP HANA Tutorials and Materials, SAP HANA Certifications, SAP HANA Guides, SAP HANA Query

Above query result indicates that all the partitions of the created database table are residing in the Master Worker Host.

Now add asynchronous replica of the above created table in HANA slave node in which the table partition doesn’t  currently exist.

Execute below Query to get the slave Node Details:

select CONCAT(CONCAT(HOST,':'),PORT) as slaveNodeDetails
from "PUBLIC"."M_SERVICES" where COORDINATOR_TYPE = 'SLAVE' 
AND SERVICE_NAME = 'indexserver'
and host != (select host from "PUBLIC"."M_SERVICES" 
where COORDINATOR_TYPE = 'MASTER' AND SERVICE_NAME = 'indexserver' )

Use the slaveNodeDetails retrieved in the above query to add asynchronous table replica as shown below:

alter table CUSTOMER_DETAILS ADD ASYNCHRONOUS REPLICA AT <slaveNodeDetails>

Above statement adds a table replica at slave node for the column store table created in master node .

Enable asynchronous replica on the above created table using below query:

ALTER TABLE CUSTOMER_DETAILS ENABLE ASYNCHRONOUS REPLICA

by which we ensure there are no differences in data between the source table and the replica table.

Any changes done to the source table in master node will asynchronously make the changes to replicated table in slave node.

Now perform planViz operation on the query using the above created asynchronously replicated table.

eg: Execute the below query which performs join operation between a table created in master node whose async table replica exists in slave node and another table which resides in slave node as shown below:

SELECT TOP 10 A.*,B.* FROM CUSTOMER_DETAILS A CROSS JOIN 
"HE2E_SCHEMA"."SALES_MASTERDATA_CUSTOMER" B

and perform planViz operation on the same :

SAP HANA Tutorials and Materials, SAP HANA Certifications, SAP HANA Guides, SAP HANA Query

Now execute the same query mentioned in above step with hint to use the asynchronously replicated table placed in slave node as shown below:

SELECT TOP 10 A.*,B.* FROM CUSTOMER_DETAILS A CROSS JOIN 
"SALES_MASTERDATA_CUSTOMER" B WITH HINT (RESULT_LAG('hana_atr'))

Perform plan viz operation on the same :

SAP HANA Tutorials and Materials, SAP HANA Certifications, SAP HANA Guides, SAP HANA Query

Thus the utility of hint RESULT_LAG(‘hana_atr’)  on an asynchronously replicated table helps end user to decide the routing volume for a specific query based on the underlying operation as per the his requirement to reduced the network traffic and in the end enhance query performance.

No comments:

Post a Comment