Wednesday, 15 January 2020

Replicating Data from Oracle to HANA using SAP HANA Web IDE SDI

Overview


SAP Smart Data Integration provides real time, high speed data provisioning, bulk data movement, and federation. Replication task or flowgraph lets you set up batch or real-time data replication and transformation scenarios in an easy-to-use web application. Data Provisioning Agent hosts data provisioning adapters, enables data federation, replication, and transformation scenarios for on-premise or in-cloud deployments. Monitors for Data Provisioning Agents, remote subscriptions, and data loads are accessible from the SAP HANA cockpit.

Architecture


In all deployments, the basic components are the same. However, the connections between the components may differ depending on whether SAP HANA is deployed on premise, in the cloud, or behind a firewall. Based on our current SAP HANA landscape we use the following deployment options for SDI.

◉ System 1: Data Provisioning Server (internal component of SAP HANA Server)
◉ System 2: Data Provisioning Agent
◉ System 3: Source system

Yo may notice Data Provisioning Server within SAP HANA connects to the agent using the TCP/IP protocol and default port is 5050

SAP HANA smart data integration, SAP HANA, SAP Web IDE, SAP HANA Learning

Enable the Data Provisioning Server


You need to enable the data provisioning server in SAP HANA server first and if you have a multi-database container scenario,  you need to execute the ALTER DATABASE statement in the system database 

ALTER DATABASE <database_name> ADD 'dpserver' AT LOCATION '<hostname>[:<port_number>]'

Open SAP HANA Administration for SYSTEMDB and switch to Configuration tab. Expand the daemon.ini and you can find dpserver.HD2 (your tenant database). The data provisioning server has been successfully enabled for your tenant database

SAP HANA smart data integration, SAP HANA, SAP Web IDE, SAP HANA Learning

If you want to remove the data provisioning server in a multi-database container scenario you also need to execute the ALTER DATABASE statement in the system database

ALTER DATABASE <database_name> REMOVE 'dpserver' AT LOCATION '<hostname>[:<port_number>]'

Download the DP Agent & Delivery Unit


For POC we use SAP HANA Server 2.00.037 and need to download corresponding version of DP Agent and Delivery Unit. For details, see the SAP HANA smart data integration Product Availability Matrix (PAM). On the SAP Software Download Center, you can find the installation packages in the following locations:

DP Agent

SAP Software Download Center > Software Downloads > Support Packages & Patches > By Alphabetical Index (A-Z) > H > SAP HANA SDI > SAP HANA SDI <version_number> > Comprised Software Component Versions > HANA DP AGENT <version_number>

DP Delivery Unit

SAP Software Download Center > Software Downloads > Installations & Upgrades > By Alphabetical Index (A-Z)  H  SAP HANA SDI  SAP HANA SDI 2.0 > COMPRISED SOFTWARE COMPONENT VERSIONS > HANA DP 2.0 > Click the ZIP file that you need, and save it to your preferred location > In the HANAIMDP<version number>.ZIP file, find and extract the HANA_IM_DP.tgz file

Import DP Delivery Unit


For multi-tenant database container configuration, you must import the delivery unit into the tenant database that you have enabled DP Server in previous step

For simplicity of security I will use SYSTEM user to import from SAP HANA Studio

In the upper left corner, click File Import. On the Import dialog, type delivery into the search box for Select an import source

SAP HANA smart data integration, SAP HANA, SAP Web IDE, SAP HANA Learning

Click Delivery Unit on the resulting navigation tree and click Next. Then select your target system, and click Next

SAP HANA smart data integration, SAP HANA, SAP Web IDE, SAP HANA Learning

On the Import Through Delivery Unit dialog, select Client radio and click Browse and navigate to the location where you downloaded the delivery unit (please extract first), select HANAIMDP.tgz, and click Finish.

SAP HANA smart data integration, SAP HANA, SAP Web IDE, SAP HANA Learning

Install the DP Agent


System Requirements

◉ Windows (64bit) 2016 Service Pack 03
◉ Instance type: AWS m5.2xlarge (32GB)
◉ AWS Instance name: xxxxxx
◉ Access privilege: service account with local administrators group
◉ Firewall port: all open (for simplicity)

Software Requirements


◓ Data Provisioning Agent: HANA DP AGENT <version_number>

◓ Oracle Database Client (12.1.0.2.0) for Microsoft Windows (x64) :  Install at C:\Oracle

SAP HANA smart data integration, SAP HANA, SAP Web IDE, SAP HANA Learning

◓ Oracle TNS file:  You can find it at C:\Oracle\Client12102\network\admin\tnsnames.ora

SAP HANA smart data integration, SAP HANA, SAP Web IDE, SAP HANA Learning

◓ SAP JVM: sapjvm-7.1.066-windows-x64 (https://tools.hana.ondemand.com/#cloud)

◓ Microsoft Visual C++ 2010 Redistributable Package :  https://www.microsoft.com/en-us/download/confirmation.aspx?id=14632

◓ Supporting Libraries which are required based on Product Availability Matrix (PAM)    

SAP HANA smart data integration, SAP HANA, SAP Web IDE, SAP HANA Learning

◓ Find the required supporting libraries inside Oracle client folder

     ◓ ojdbc7.jar:  C:\Oracle\Client12102\jdbc\lib
     ◓ xdb6.jar: C:\Oracle\Client12102\RDBMS\jlib
     ◓ xmlparserv2.jar: C:\Oracle\Client12102\LIB

Note: You must use the Administrator user or a user in the administrators group to install and configure DP Agent.

Installation Steps:

1. Create a temp folder C:\Install and copy the agent installation file into and extract there. Go inside and find the hdbsetup.exe and right click to run as administrator.

2. Choose install new SAP HANA Data Provisioning Agent and specify the installation path. I left installation with default setting.

3. The user that runs the agent service must have read/write access to the installation directory so that configuration files can be updated.

Default Installation Paths as below

SAP HANA smart data integration, SAP HANA, SAP Web IDE, SAP HANA Learning

Configure DP Agent


You must configure the Data Provisioning Agent before you can use adapters to connect to data sources, create remote sources, and so on.

SAP HANA smart data integration, SAP HANA, SAP Web IDE, SAP HANA Learning

Start the configuration tool with run as administrator

C:\usr\sap\dataprovagent\configTool\dpagentconfigtool.exe

Connect to the SAP HANA serverSpecify the host name, port, and HANA Agent Admin User credentials for the SAP HANA server

SAP HANA smart data integration, SAP HANA, SAP Web IDE, SAP HANA Learning

Register DP Agent to HANA server

SAP HANA smart data integration, SAP HANA, SAP Web IDE, SAP HANA Learning

Start DP Agent by clicking Start Agent Button.Register OracleLogReaderAdapter to HANA server

SAP HANA smart data integration, SAP HANA, SAP Web IDE, SAP HANA Learning

Data Provisioning Agent Tuning: For large initial loads, open the DP Agent configuration file  C:\usr\sap\dataprovagent\dpagent.ini, configure for 24GB of memory at -Xmx24576m

Disable SSL for Agent to HANA communication on TCP if you don’t have SSL in place.  Beyond POC environment, We recommend to install SSL certificate to encrypt the communication

SAP HANA smart data integration, SAP HANA, SAP Web IDE, SAP HANA Learning

Lastly we need to copy over those supporting libraries into DP Agent library folder C:\usr\sap\dataprovagent\lib

Create a Remote Source


Oracle Database:

◓ User Name – xxxxxx
◓ Password – xxxxxx
◓ Database Service Name – SCUAT
◓ Host name – xxxxxx

To create a remote source we need to provide the following

1. Remote Source Name: ORCL_UAT
2. Adapter Name: OracleLogReaderAdapter
3. Source Location: agent(dpagentorcl)

Also provide the following properties for remote source and some of them are for performance optimization

SAP HANA smart data integration, SAP HANA, SAP Web IDE, SAP HANA Learning

Lastly, provide Oracle Connection Credential (Technical user)

SAP HANA smart data integration, SAP HANA, SAP Web IDE, SAP HANA Learning

Test connection to see if it is OK

SAP HANA smart data integration, SAP HANA, SAP Web IDE, SAP HANA Learning

Oracle Database Permissions


Oracle database users need to have certain permissions granted to them in order to carry out real-time change data capture or batch or initial load transactions. 

SDI supports both database-level supplemental logging and table-level supplemental logging. We decide to go table-level supplemental logging.

Set your logging level in the Adapter Preferences window of the Data Provisioning Agent configuration tool for the Oracle Log Reader adapter. Then, run the necessary scripts found in the oracle_init_example.sql file, located in <DPAgent_root>\LogReader\Scripts.
Here is our table-level supplemental logging

-- Database Level

ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (PRIMARY KEY) COLUMNS;
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (UNIQUE) COLUMNS;Database User Level

-- Create database user

CREATE USER <user_id> IDENTIFIED BY <password> DEFAULT TABLESPACE USERS TEMPORARY TABLESPACE TEMP PROFILE DEFAULT ACCOUNT UNLOCK;

-- Grant two roles for database user

GRANT EXECUTE_CATALOG_ROLE TO <user_id>;
GRANT SELECT_CATALOG_ROLE TO <user_id>;
ALTER USER <user_id> DEFAULT ROLE ALL;

-- Grant 11 system privileges for database user

GRANT ALTER ANY TABLE TO <user_id>;
GRANT CREATE PROCEDURE TO <user_id>;
GRANT CREATE SEQUENCE TO <user_id>;
GRANT CREATE SESSION TO <user_id>;
GRANT CREATE TABLE TO <user_id>;
GRANT CREATE TRIGGER TO <user_id>;
GRANT CREATE VIEW TO <user_id>;
GRANT LOGMINING TO <user_id>;
GRANT SELECT ANY TABLE TO <user_id>;
GRANT SELECT ANY TRANSACTION TO <user_id>;
GRANT UNLIMITED TABLESPACE TO <user_id>;

-- Grant 53 object privileges for database user

GRANT SELECT ON SYS.ATTRIBUTE$ TO <user_id>;
GRANT SELECT ON SYS.CCOL$ TO <user_id>;
GRANT SELECT ON SYS.CDEF$ TO <user_id>;
GRANT SELECT ON SYS.COL$ TO <user_id>;
GRANT SELECT ON SYS.COLLECTION$ TO <user_id>;
GRANT SELECT ON SYS.COLTYPE$ TO <user_id>;
GRANT SELECT ON SYS.CON$ TO <user_id>;
GRANT SELECT ON SYS.DBA_ERRORS TO <user_id>;
GRANT SELECT ON SYS.DBA_LIBRARIES TO <user_id>;
GRANT SELECT ON SYS.DBA_LOG_GROUPS TO <user_id>;
GRANT SELECT ON SYS.DBA_OBJECTS TO <user_id>;
GRANT SELECT ON SYS.DBA_SYNONYMS TO <user_id>;
GRANT SELECT ON SYS.DBA_TABLES TO <user_id>;
GRANT SELECT ON SYS.DBA_TRIGGERS TO <user_id>;
GRANT EXECUTE ON SYS.DBMS_LOGMNR TO <user_id>;
GRANT EXECUTE ON SYS.DBMS_LOGMNR_D TO <user_id>;
GRANT SELECT ON SYS.DEFERRED_STG$ TO <user_id>;
GRANT SELECT ON SYS.GV_$INSTANCE TO <user_id>;
GRANT SELECT ON SYS.GV_$SESSION TO <user_id>;
GRANT SELECT ON SYS.ICOL$ TO <user_id>;
GRANT SELECT ON SYS.IND$ TO <user_id>;
GRANT SELECT ON SYS.INDCOMPART$ TO <user_id>;
GRANT SELECT ON SYS.INDPART$ TO <user_id>;
GRANT SELECT ON SYS.INDSUBPART$ TO <user_id>;
GRANT SELECT ON SYS.LOB$ TO <user_id>;
GRANT SELECT ON SYS.LOBCOMPPART$ TO <user_id>;
GRANT SELECT ON SYS.LOBFRAG$ TO <user_id>;
GRANT SELECT ON SYS.MLOG$ TO <user_id>;
GRANT SELECT ON SYS.NTAB$ TO <user_id>;
GRANT SELECT ON SYS.OBJ$ TO <user_id>;
GRANT SELECT ON SYS.OPQTYPE$ TO <user_id>;
GRANT SELECT ON SYS.PARTOBJ$ TO <user_id>;
GRANT SELECT ON SYS.SEG$ TO <user_id>;
GRANT SELECT ON SYS.SEQ$ TO <user_id>;
GRANT SELECT ON SYS.SNAP$ TO <user_id>;
GRANT SELECT ON SYS.TAB$ TO <user_id>;
GRANT SELECT ON SYS.TABCOMPART$ TO <user_id>;
GRANT SELECT ON SYS.TABPART$ TO <user_id>;
GRANT SELECT ON SYS.TABSUBPART$ TO <user_id>;
GRANT SELECT ON SYS.TS$ TO <user_id>;
GRANT SELECT ON SYS.TYPE$ TO <user_id>;
GRANT SELECT ON SYS.USER$ TO <user_id>;
GRANT SELECT ON SYS.V_$ARCHIVED_LOG TO <user_id>;
GRANT SELECT ON SYS.V_$ARCHIVE_DEST TO <user_id>;
GRANT SELECT ON SYS.V_$DATABASE TO <user_id>;
GRANT SELECT ON SYS.V_$DATABASE_INCARNATION TO <user_id>;
GRANT SELECT ON SYS.V_$INSTANCE TO <user_id>;
GRANT SELECT ON SYS.V_$LOG TO <user_id>;
GRANT SELECT ON SYS.V_$LOGFILE TO <user_id>;
GRANT SELECT ON SYS.V_$LOGMNR_CONTENTS TO <user_id>;
GRANT SELECT ON SYS.V_$LOGMNR_LOGS TO <user_id>;
GRANT SELECT ON SYS.V_$PARAMETER TO <user_id>;
GRANT SELECT ON SYS.V_$TRANSACTION TO <user_id>;

Oracle Table Level

We also enable supplemental  logging for table level with primary key and unique index and please run the scripts located in the oracle_init_example.sql file on your Oracle client to set this up.

Data Provisioning Monitoring


We need to grant sap.hana.im.dp.monitor.roles::Monitoring to the user who will perform monitoring tasks.

Subscription Monitor

<host name>:80<2 digit instance number>/sap/hana/im/dp/monitor/?view=DPSubscriptionMonitor


SAP HANA smart data integration, SAP HANA, SAP Web IDE, SAP HANA Learning

Replication Task Monitor

<host name>:80<2 digit instance number>/sap/hana/im/dp/monitor/?view=IMTaskMonitor

SAP HANA smart data integration, SAP HANA, SAP Web IDE, SAP HANA Learning

DP Agent Monitor

<host name>:80<2 digit instance number>/sap/hana/im/dp/monitor/?view=DPAgentMonitor

SAP HANA smart data integration, SAP HANA, SAP Web IDE, SAP HANA Learning

Access External Resource


User Provided Service


In order to access the remote source created in previous step inside container, we have to create a user provided service to grant access to the container. Basically anything your container need to access has to be granted explicitly in advance. A hdbgrantor file will also be created in Web IDE so object owner and application user will have permission to access the granted resource. They DON’T HAVE access at default. After this the remote source behaves like local object to container.

Inside your MTA project, you can add reference directly at database module. Right click db module and choose New -> Add SAP HANA Service Connection

SAP HANA smart data integration, SAP HANA, SAP Web IDE, SAP HANA Learning

Select Non HDI Container since we are adding reference to external system resource not HDI container

For Host information, it is optional to provide since you can add reference to project space mapping tenant database only. You can leave it empty or just localhost. For simplicity I used tenant system administrator to provision Oracle remote source but in real implementation you should create a dedicated users for development and operation support.

SAP HANA smart data integration, SAP HANA, SAP Web IDE, SAP HANA Learning

Alternatively you can create a user provided service using the either XS CLI or XS Advanced Cockpit

xs cups grant-service -p "{\"host\":\"<hostname>\",\"port\":\"3<Instance Number><15|13>\",\"user\":\"<user>\",\"password\":\"<Password>\",\"driver\":\"com.sap.db.jdbc.Driver\",\"tags\":[\"hana\"] , \"schema\" : \"PLAIN\" }"

In case you create user provided service not from SAP HANA Service Connection, you have to open the project mta.yaml file to add resource manually , then attach it as dependency to db module.

Add grant-service resource

SAP HANA smart data integration, SAP HANA, SAP Web IDE, SAP HANA Learning

Go to module tab and attach grant-service resource as dependency to db module

SAP HANA smart data integration, SAP HANA, SAP Web IDE, SAP HANA Learning

You also need to create a hdbgrants file which grant access privileges for the remote source to the object owner and application user of container schema.

SAP HANA smart data integration, SAP HANA, SAP Web IDE, SAP HANA Learning

Create a Replication Task


To replicate data from objects in a remote source into tables in SAP HANA, you must configure the replication process by creating an .hdbreptask file. After the .hdbreptask has been configured, activate it to generate a stored procedure, a remote subscription, one or more virtual tables for objects that you want to replicate, and target tables. Initial Load Only doesn’t need remote subscription otherwise the remote subscription will always be created. When the stored procedure is called, an initial load is run. When realtime is enabled, then subsequent changes are automatically distributed.

Any change to source data during the initial load will be captured in initial load and after initial load the changes will be replicated in realtime.

DDL changes to source tables that are associated with a replication task will be propagated to SAP HANA if you choose replication with structure so that the same changes will be applied to the SAP HANA target tables.

Note: The Replication Editor is available only starting from SAP Web IDE for SAP HANA with XS Advanced Feature Revision 1.

Enable SAP Web IDE Extensions for SDI


Open the Preferences perspective, in the left sidebar, choose (Preferences).Under Workspace Preferences, choose Extensions.In the Extensions pane to the right, select the toggle button for the extension you want to enable.Choose Save.Refresh your browser.

SAP HANA smart data integration, SAP HANA, SAP Web IDE, SAP HANA Learning

Add Database Module and Replication Task


It is pretty straightforward to add database module into MTA project

SAP HANA smart data integration, SAP HANA, SAP Web IDE, SAP HANA Learning

Develop Replication Task


Navigate to the src folder inside your added database module. Choose File New  Replication Task. For best practice we create a folder with name repl to hold all replication tasks, and then add a replication task inside this folder        

SAP HANA smart data integration, SAP HANA, SAP Web IDE, SAP HANA Learning

Enter a unique name for the task, and then click Create. Click Connect to a remote source to choose a remote source object.

SAP HANA smart data integration, SAP HANA, SAP Web IDE, SAP HANA Learning

Select one of the available remote sources, and then click OK.

SAP HANA smart data integration, SAP HANA, SAP Web IDE, SAP HANA Learning

Now we need to click Add Object to add objects from source database  

SAP HANA smart data integration, SAP HANA, SAP Web IDE, SAP HANA Learning

Click Dictionary Search. Dictionaries can be created on some remote sources so that you can more easily find the objects you want to replicate. You can execute the following statement by building HANA dictionary tables that contain remote source objects.

ALTER REMOVE SOURCE  REMOTE_SOURCE_NAME REFRESH OBJECTS

Navigate to the object level and select the remote source objects that you want to replicate, and then click OK.

SAP HANA smart data integration, SAP HANA, SAP Web IDE, SAP HANA Learning

Option Description
Initial load only Performs a one-time data load without any real-time replication. Always available. 
Initial + Realtime Performs the initial data load and enables real-time replication. Available when CDC is supported, for tables and virtual tables. 
Realtime Enables real-time replication without performing an initial data load. Available when CDC is supported, for tables and virtual tables. 
No data transfer Replicates only the object structure without transferring any data. Always available.
Initial + realtime with structure Performs the initial data load, enables real-time replication, and tracks object-level changes. Available when CDC is supported and for tables. 
Realtime only with structure Enables real-time replication and tracks object-level changes without performing an initial data load. Available when CDC is supported and for tables. 

Click OK to close the Select Remote Objects dialog. The following information is included in the table.

SAP HANA smart data integration, SAP HANA, SAP Web IDE, SAP HANA Learning

You can sort, filter, or choose to show or hide the column names by clicking any column heading.   

SAP HANA smart data integration, SAP HANA, SAP Web IDE, SAP HANA Learning

he replication task is ready to run at this point. Before running the task, you can edit one or more replication objects by selecting the objects and making changes at the bottom of the screen.

You can also partition, create filters, define target options such as truncating or dropping the target table or changing the load behavior. See more topics in this section for more information.

SAP HANA smart data integration, SAP HANA, SAP Web IDE, SAP HANA Learning

Target Properties for replicate table  

SAP HANA smart data integration, SAP HANA, SAP Web IDE, SAP HANA Learning

Note: Check on Drop Target Table and Truncate Table option will drop or truncate target table.

Partition Data in a Replication Task


Partitioning data can be helpful when you are initially loading a large data set, because it can            improve performance and assist in managing memory usage. Please check further details at            Partition Data in a Replication Task

Create an Expression


Use the Expression Editor to create an expression to enhance or filter the data during replication.

SAP HANA smart data integration, SAP HANA, SAP Web IDE, SAP HANA Learning

Select a remote object.

Click Edit Details.

Click Projection.To create a column mapping expression, choose Target Table Columns to edit or add a column.

Click in the Mapping column to create the expression.To filter the replication data, click Filter Target Table Rows.

Select the columns that you want to use in your expression. You can drag and drop the column names from the list and place it in the Enter filter expression box.Click or type any operators to complete the expression.Save the replication task.

SAP HANA smart data integration, SAP HANA, SAP Web IDE, SAP HANA Learning

Edit, Add, or Delete Target Columns


Select a remote object. Click Edit Details.

Click Projection  Target Table Columns.

Click Add Column .

Enter the column name, select the data type, choose whether you want this column to be the primary key or to allow it to contain a null value. Then create an expression for the mapped value.

Click Validate Syntax to check for errors in the expression, and then click Apply.

Save the replication task

SAP HANA smart data integration, SAP HANA, SAP Web IDE, SAP HANA Learning

Validate your expression before save

SAP HANA smart data integration, SAP HANA, SAP Web IDE, SAP HANA Learning

Build Database Module


Choose Build / Build Selected Files.

Virtual table(s): Generated in the specified virtual table schema. You can display the contents of the virtual table in SAP HANA studio.

Remote subscription(s): Generated in the schema selected for the virtual table. This is only generated when the Initial load only option is not selected.

Task(s): Generated in the same schema as the target table.

Target table(s): Populated with the content after execution.

Procedure: Generated in the schema of the target table, the procedure performs three functions.       

SAP HANA smart data integration, SAP HANA, SAP Web IDE, SAP HANA Learning

Execute a Replication Task


Sets the remote subscription to the Queue status. Calls Start Task to perform the initial load of            the data. Sets the remote subscription to the Distribute status. Any changes,additions or                      deletions made to the source data during the initial load are updated in the target system. Any            changes to the source data thereafter are updated real time to the target.To begin replicating              data, you need to open the container first and find those replication procedures

SAP HANA smart data integration, SAP HANA, SAP Web IDE, SAP HANA Learning

Click Procedures to display all available replication procedures

SAP HANA smart data integration, SAP HANA, SAP Web IDE, SAP HANA Learning

Right click the procedure and select Generate Call Statement

SAP HANA smart data integration, SAP HANA, SAP Web IDE, SAP HANA Learning

Click Run to execute replication task

SAP HANA smart data integration, SAP HANA, SAP Web IDE, SAP HANA Learning

The replication task begins. To monitor the status, select the replication task name in the workspace list and choose Run  Launch Tasks Overview. The Database Explorer shows the tasks that have been processed in the container.

Measuring Latency


You can measure trends by issuing the following statements at HANA server

ALTER REMOTE SOURCE <remote_source_name> START LATENCY MONITORING <latency_ticket_name> [INTERVAL <interval_in_seconds>]

Starts the collection of latency statistics one time or at regular intervals. The user specifies a target latency ticket in the monitoring view.

Let’s take a quick look at result

SELECT * From “SYS”.”M_REMOTE_SOURCE_LATENCY_HISTORY”

SAP HANA smart data integration, SAP HANA, SAP Web IDE, SAP HANA Learning

The sequence of components from end to end is

SDB (source database) > LRI (LogReader input) > LRO (LogReader output) > SNDR (sender) > FRAMEWORK > RECEIVER > DISTRIBUTOR > APPLIER.

So you can take a look at statistics and find the where is bottleneck for your replication performance

Data Recovery


After a replication task has failed, you can use the lost data tracker in the Data Provisioning Agent command-line configuration tool to identify and correct data inconsistencies that may have occurred. 

No comments:

Post a Comment