Sunday, 13 March 2022

The fastest way to load data from HANA Cloud, HANA into HANA Cloud, HANA Data Lake

Overview

Recently as customers are moving larger and larger tables from HANA into HANA Data Lake, I am being asked what the fastest way is to move data from HANA to HANA Data Lake.  Or more precisely I am asked if there is a faster way then doing a simple HANA INSERT into a HANA Data Lake virtual table.

You may be asking why customers are moving large tables from HANA to HANA Data Lake (HDL) and the most popular use case for this is an initial materialization of a large datasets or archiving older data to HDL.  Most of these customers are using HANA Smart Data Integration (SDI) to do this materialization and often using the same interface for change data capture using SDI Flowgraphs or SDI real-time replication to keep these tables up to date.

The three simple methods of moving data that we will examine here are:

◉ a simple HANA INSERT into a HDL virtual table,

◉ a simple HDL INSERT from HDL accessing a HANA virtual table, and

◉ a HANA export and HDL LOAD.

Now you might be asking:

“Why go through HANA?”

“Why not load the data directly into HDL?”

And again, these customers are using the HANA Enterprise Information Management tools which currently require a HANA object (local or virtual) as a target.  In future BLOGs we’ll discuss loading data directly into HDL via IQ Client-Side Load, Data Services and Data Intelligence.

The fastest way to load data from HANA Cloud, HANA into HANA Cloud, HANA Data Lake is doing it from HDL/IQ and running an INSERT statement with a SELECT from a HANA table using the “create existing local temporary table” to create a proxy table pointing to the HANA physical table (see details below).

Method Rows  Data Size  Time – seconds
HDL/IQ INSERT..SELECT 28,565,809 3.3 GB of data 52.86

*HDL/IQ LOAD

Azure File System 

28,565,809  3.3 GB of data  116 (1m 56s) 

*HDL/IQ LOAD

HDL File System

28,565,809  3.3 GB of data  510 (8m 30s) 
HANA INSERT..SELECT  28,565,809  3.3 GB of data  1277 (21m 7s) 

* Does not include the time to export the data from HANA to the file system

Using a TPC-D ORDERS table with 28,565,809 rows which is about 3.3 GB of data.  Loading a small HDL configuration.

This was tested using this HANA Cloud configuration:


HANA 60GB/200GB 4 vCPU

HDL 16TB 8vCPU worker/8vCPU coordinator

In an HDL configuration more vCPUs would have allowed this to run more parallel (especially on wider tables) and by adding more TBs HDL will acquire more disk i/o throughput.

Detail configuration notes and syntax used for testing


Start Hana Cockpit to Manage SAP HANA Cloud

SAP HANA Cloud, SAP HANA Data Lake, SAP HANA Career, SAP HANA Preparation, SAP HANA Career Prep, SAP HANA Jobs, SAP HANA Skills

From Data Lake … choose Open in SAP HANA Database Explorer

You maybe prompted for your HDLADMIN password if this is the first time going here.

SAP HANA Cloud, SAP HANA Data Lake, SAP HANA Career, SAP HANA Preparation, SAP HANA Career Prep, SAP HANA Jobs, SAP HANA Skills

Enter SQL command and click  to execute

SAP HANA Cloud, SAP HANA Data Lake, SAP HANA Career, SAP HANA Preparation, SAP HANA Career Prep, SAP HANA Jobs, SAP HANA Skills

HDL commands for creating:


1. a server connecting to HANA Cloud from HDL,
2. a local HDL table to load the data into and creating
3. a local temporary proxy table pointing to the table in the HANA Cloud instance
 
CREATE SERVER

–DROP SERVER DRHHC2_HDB

CREATE SERVER DRHHC2_HDB CLASS ‘HANAODBC’ USING ‘Driver=libodbcHDB.so;ConnectTimeout=60000;ServerNode=XXXX.hana.prod-us10.hanacloud.ondemand.com:443;ENCRYPT=TRUE;ssltruststore=XXXX.hana.prod-us10.hanacloud.ondemand.com;ssltrustcert=Yes;UID=DBADMIN;PWD=XXXXX;’

CREATE TARGET TABLE

CREATE  TABLE REGIONPULL (

R_REGIONKEY   bigint                  not null,

R_NAME            varchar(25)        not null,

R_COMMENT    varchar(152)      not null,

primary key (R_REGIONKEY)

);

CREATE local temporary PROXY

create existing local temporary table REGION_PROXY (

R_REGIONKEY   bigint                  not null,

R_NAME                          varchar(25)        not null,

R_COMMENT    varchar(152)      not null,

primary key (R_REGIONKEY)

)

at ‘DRHHC2_HDB..TPCD.REGION’;


INSERT DATA

INSERT into REGIONPULL SELECT * from REGION_PROXY;

Commit;

–1.9s

ORDERS table test commands


–DROP TABLE ORDERSPULL;

create table ORDERSPULL (

O_ORDERKEY           BIGINT               not null,

O_CUSTKEY            BIGINT               not null,

O_ORDERSTATUS        VARCHAR(1)           not null,

O_TOTALPRICE         DECIMAL(12,2)        not null,

O_ORDERDATE          DATE                 not null,

O_ORDERPRIORITY      VARCHAR(15)          not null,

O_CLERK              VARCHAR(15)          not null,

O_SHIPPRIORITY       INTEGER              not null,

O_COMMENT            VARCHAR(79)          not null,

primary key (O_ORDERKEY)

);
 
create existing local temporary table ORDERS_PROXY (

O_ORDERKEY           BIGINT               not null,

O_CUSTKEY            BIGINT               not null,

O_ORDERSTATUS        VARCHAR(1)           not null,

O_TOTALPRICE         DECIMAL(12,2)        not null,

O_ORDERDATE          DATE                 not null,

O_ORDERPRIORITY      VARCHAR(15)          not null,

O_CLERK              VARCHAR(15)          not null,

O_SHIPPRIORITY       INTEGER              not null,

O_COMMENT            VARCHAR(79)          not null

)

at ‘DRHHC2_HDB..TPCD.ORDERS’;


INSERT into ORDERSPULL SELECT * from ORDERS_PROXY;

Commit;

–59s

–52.86 s


SELECT COUNT(*) FROM ORDERSPULL;

–28,565,809

LINEITEM table test commands


create table LINEITEM (

L_ORDERKEY           BIGINT               not null,

L_PARTKEY            BIGINT               not null,

L_SUPPKEY            BIGINT               not null,

L_LINENUMBER         INTEGER              not null,

L_QUANTITY           DECIMAL(12,2)        not null,

L_EXTENDEDPRICE      DECIMAL(12,2)        not null,

L_DISCOUNT           DECIMAL(12,2)        not null,

L_TAX                DECIMAL(12,2)        not null,

L_RETURNFLAG         VARCHAR(1)              not null,

L_LINESTATUS         VARCHAR(1)              not null,

L_SHIPDATE           DATE                 not null,

L_COMMITDATE         DATE                 not null,

L_RECEIPTDATE        DATE                 not null,

L_SHIPINSTRUCT       VARCHAR(25)          not null,

L_SHIPMODE           VARCHAR(10)          not null,

L_COMMENT            VARCHAR(44)          not null,

primary key (L_ORDERKEY,L_LINENUMBER)

);

create existing local temporary table LINEITEM_PROXY (

L_ORDERKEY           BIGINT               not null,

L_PARTKEY            BIGINT               not null,

L_SUPPKEY            BIGINT               not null,

L_LINENUMBER         INTEGER              not null,

L_QUANTITY           DECIMAL(12,2)        not null,

L_EXTENDEDPRICE      DECIMAL(12,2)        not null,

L_DISCOUNT           DECIMAL(12,2)        not null,

L_TAX                DECIMAL(12,2)        not null,

L_RETURNFLAG         VARCHAR(1)              not null,

L_LINESTATUS         VARCHAR(1)              not null,

L_SHIPDATE           DATE                 not null,

L_COMMITDATE         DATE                 not null,

L_RECEIPTDATE        DATE                 not null,

L_SHIPINSTRUCT       VARCHAR(25)          not null,

L_SHIPMODE           VARCHAR(10)          not null,

L_COMMENT            VARCHAR(44)          not null

)

at ‘DRHHC2_HDB..TPCD.LINEITEM’;

INSERT into LINEITEM SELECT * from LINEITEM_PROXY;

Commit;

— Rows affected:       114,129,863

— Client elapsed time: 4 m 52 s

No comments:

Post a Comment