Overview
SAP Smart Data Integration provides real time, high speed data provisioning, bulk data movement, and federation. Replication task or flowgraph lets you set up batch or real-time data replication and transformation scenarios in an easy-to-use web application. Data Provisioning Agent hosts data provisioning adapters, enables data federation, replication, and transformation scenarios for on-premise or in-cloud deployments. Monitors for Data Provisioning Agents, remote subscriptions, and data loads are accessible from the SAP HANA cockpit.
Architecture
In all deployments, the basic components are the same. However, the connections between the components may differ depending on whether SAP HANA is deployed on premise, in the cloud, or behind a firewall. Based on our current SAP HANA landscape we use the following deployment options for SDI.
◉ System 1: Data Provisioning Server (internal component of SAP HANA Server)
◉ System 2: Data Provisioning Agent
◉ System 3: Source system
Yo may notice Data Provisioning Server within SAP HANA connects to the agent using the TCP/IP protocol and default port is 5050
Enable the Data Provisioning Server
You need to enable the data provisioning server in SAP HANA server first and if you have a multi-database container scenario, you need to execute the ALTER DATABASE statement in the system database
ALTER DATABASE <database_name> ADD 'dpserver' AT LOCATION '<hostname>[:<port_number>]'
Open SAP HANA Administration for SYSTEMDB and switch to Configuration tab. Expand the daemon.ini and you can find dpserver.HD2 (your tenant database). The data provisioning server has been successfully enabled for your tenant database
If you want to remove the data provisioning server in a multi-database container scenario you also need to execute the ALTER DATABASE statement in the system database
ALTER DATABASE <database_name> REMOVE 'dpserver' AT LOCATION '<hostname>[:<port_number>]'
Download the DP Agent & Delivery Unit
For POC we use SAP HANA Server 2.00.037 and need to download corresponding version of DP Agent and Delivery Unit. For details, see the SAP HANA smart data integration Product Availability Matrix (PAM). On the SAP Software Download Center, you can find the installation packages in the following locations:
DP Agent
SAP Software Download Center > Software Downloads > Support Packages & Patches > By Alphabetical Index (A-Z) > H > SAP HANA SDI > SAP HANA SDI <version_number> > Comprised Software Component Versions > HANA DP AGENT <version_number>
DP Delivery Unit
SAP Software Download Center > Software Downloads > Installations & Upgrades > By Alphabetical Index (A-Z) H SAP HANA SDI SAP HANA SDI 2.0 > COMPRISED SOFTWARE COMPONENT VERSIONS > HANA DP 2.0 > Click the ZIP file that you need, and save it to your preferred location > In the HANAIMDP<version number>.ZIP file, find and extract the HANA_IM_DP.tgz file
Import DP Delivery Unit
For multi-tenant database container configuration, you must import the delivery unit into the tenant database that you have enabled DP Server in previous step
For simplicity of security I will use SYSTEM user to import from SAP HANA Studio
In the upper left corner, click File Import. On the Import dialog, type delivery into the search box for Select an import source
Click Delivery Unit on the resulting navigation tree and click Next. Then select your target system, and click Next
On the Import Through Delivery Unit dialog, select Client radio and click Browse and navigate to the location where you downloaded the delivery unit (please extract first), select HANAIMDP.tgz, and click Finish.
Install the DP Agent
System Requirements
◉ Windows (64bit) 2016 Service Pack 03
◉ Instance type: AWS m5.2xlarge (32GB)
◉ AWS Instance name: xxxxxx
◉ Access privilege: service account with local administrators group
◉ Firewall port: all open (for simplicity)
Software Requirements
◓ Data Provisioning Agent: HANA DP AGENT <version_number>
◓ Oracle Database Client (12.1.0.2.0) for Microsoft Windows (x64) : Install at C:\Oracle
◓ Oracle TNS file: You can find it at C:\Oracle\Client12102\network\admin\tnsnames.ora
◓ SAP JVM: sapjvm-7.1.066-windows-x64 (https://tools.hana.ondemand.com/#cloud)
◓ Microsoft Visual C++ 2010 Redistributable Package : https://www.microsoft.com/en-us/download/confirmation.aspx?id=14632
◓ Supporting Libraries which are required based on Product Availability Matrix (PAM)
◓ Find the required supporting libraries inside Oracle client folder
◓ ojdbc7.jar: C:\Oracle\Client12102\jdbc\lib
◓ xdb6.jar: C:\Oracle\Client12102\RDBMS\jlib
◓ xmlparserv2.jar: C:\Oracle\Client12102\LIB
Note: You must use the Administrator user or a user in the administrators group to install and configure DP Agent.
Installation Steps:
1. Create a temp folder C:\Install and copy the agent installation file into and extract there. Go inside and find the hdbsetup.exe and right click to run as administrator.
2. Choose install new SAP HANA Data Provisioning Agent and specify the installation path. I left installation with default setting.
3. The user that runs the agent service must have read/write access to the installation directory so that configuration files can be updated.
Default Installation Paths as below
Configure DP Agent
You must configure the Data Provisioning Agent before you can use adapters to connect to data sources, create remote sources, and so on.
Start the configuration tool with run as administrator
C:\usr\sap\dataprovagent\configTool\dpagentconfigtool.exe
Connect to the SAP HANA serverSpecify the host name, port, and HANA Agent Admin User credentials for the SAP HANA server
Register DP Agent to HANA server
Start DP Agent by clicking Start Agent Button.Register OracleLogReaderAdapter to HANA server
Data Provisioning Agent Tuning: For large initial loads, open the DP Agent configuration file C:\usr\sap\dataprovagent\dpagent.ini, configure for 24GB of memory at -Xmx24576m
Disable SSL for Agent to HANA communication on TCP if you don’t have SSL in place. Beyond POC environment, We recommend to install SSL certificate to encrypt the communication
Lastly we need to copy over those supporting libraries into DP Agent library folder C:\usr\sap\dataprovagent\lib
Create a Remote Source
Oracle Database:
◓ User Name – xxxxxx
◓ Password – xxxxxx
◓ Database Service Name – SCUAT
◓ Host name – xxxxxx
To create a remote source we need to provide the following
1. Remote Source Name: ORCL_UAT
2. Adapter Name: OracleLogReaderAdapter
3. Source Location: agent(dpagentorcl)
Also provide the following properties for remote source and some of them are for performance optimization
Lastly, provide Oracle Connection Credential (Technical user)
Test connection to see if it is OK
Oracle Database Permissions
Oracle database users need to have certain permissions granted to them in order to carry out real-time change data capture or batch or initial load transactions.
SDI supports both database-level supplemental logging and table-level supplemental logging. We decide to go table-level supplemental logging.
Set your logging level in the Adapter Preferences window of the Data Provisioning Agent configuration tool for the Oracle Log Reader adapter. Then, run the necessary scripts found in the oracle_init_example.sql file, located in <DPAgent_root>\LogReader\Scripts.
Here is our table-level supplemental logging
-- Database Level
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (PRIMARY KEY) COLUMNS;
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (UNIQUE) COLUMNS;Database User Level
-- Create database user
CREATE USER <user_id> IDENTIFIED BY <password> DEFAULT TABLESPACE USERS TEMPORARY TABLESPACE TEMP PROFILE DEFAULT ACCOUNT UNLOCK;
-- Grant two roles for database user
GRANT EXECUTE_CATALOG_ROLE TO <user_id>;
GRANT SELECT_CATALOG_ROLE TO <user_id>;
ALTER USER <user_id> DEFAULT ROLE ALL;
-- Grant 11 system privileges for database user
GRANT ALTER ANY TABLE TO <user_id>;
GRANT CREATE PROCEDURE TO <user_id>;
GRANT CREATE SEQUENCE TO <user_id>;
GRANT CREATE SESSION TO <user_id>;
GRANT CREATE TABLE TO <user_id>;
GRANT CREATE TRIGGER TO <user_id>;
GRANT CREATE VIEW TO <user_id>;
GRANT LOGMINING TO <user_id>;
GRANT SELECT ANY TABLE TO <user_id>;
GRANT SELECT ANY TRANSACTION TO <user_id>;
GRANT UNLIMITED TABLESPACE TO <user_id>;
-- Grant 53 object privileges for database user
GRANT SELECT ON SYS.ATTRIBUTE$ TO <user_id>;
GRANT SELECT ON SYS.CCOL$ TO <user_id>;
GRANT SELECT ON SYS.CDEF$ TO <user_id>;
GRANT SELECT ON SYS.COL$ TO <user_id>;
GRANT SELECT ON SYS.COLLECTION$ TO <user_id>;
GRANT SELECT ON SYS.COLTYPE$ TO <user_id>;
GRANT SELECT ON SYS.CON$ TO <user_id>;
GRANT SELECT ON SYS.DBA_ERRORS TO <user_id>;
GRANT SELECT ON SYS.DBA_LIBRARIES TO <user_id>;
GRANT SELECT ON SYS.DBA_LOG_GROUPS TO <user_id>;
GRANT SELECT ON SYS.DBA_OBJECTS TO <user_id>;
GRANT SELECT ON SYS.DBA_SYNONYMS TO <user_id>;
GRANT SELECT ON SYS.DBA_TABLES TO <user_id>;
GRANT SELECT ON SYS.DBA_TRIGGERS TO <user_id>;
GRANT EXECUTE ON SYS.DBMS_LOGMNR TO <user_id>;
GRANT EXECUTE ON SYS.DBMS_LOGMNR_D TO <user_id>;
GRANT SELECT ON SYS.DEFERRED_STG$ TO <user_id>;
GRANT SELECT ON SYS.GV_$INSTANCE TO <user_id>;
GRANT SELECT ON SYS.GV_$SESSION TO <user_id>;
GRANT SELECT ON SYS.ICOL$ TO <user_id>;
GRANT SELECT ON SYS.IND$ TO <user_id>;
GRANT SELECT ON SYS.INDCOMPART$ TO <user_id>;
GRANT SELECT ON SYS.INDPART$ TO <user_id>;
GRANT SELECT ON SYS.INDSUBPART$ TO <user_id>;
GRANT SELECT ON SYS.LOB$ TO <user_id>;
GRANT SELECT ON SYS.LOBCOMPPART$ TO <user_id>;
GRANT SELECT ON SYS.LOBFRAG$ TO <user_id>;
GRANT SELECT ON SYS.MLOG$ TO <user_id>;
GRANT SELECT ON SYS.NTAB$ TO <user_id>;
GRANT SELECT ON SYS.OBJ$ TO <user_id>;
GRANT SELECT ON SYS.OPQTYPE$ TO <user_id>;
GRANT SELECT ON SYS.PARTOBJ$ TO <user_id>;
GRANT SELECT ON SYS.SEG$ TO <user_id>;
GRANT SELECT ON SYS.SEQ$ TO <user_id>;
GRANT SELECT ON SYS.SNAP$ TO <user_id>;
GRANT SELECT ON SYS.TAB$ TO <user_id>;
GRANT SELECT ON SYS.TABCOMPART$ TO <user_id>;
GRANT SELECT ON SYS.TABPART$ TO <user_id>;
GRANT SELECT ON SYS.TABSUBPART$ TO <user_id>;
GRANT SELECT ON SYS.TS$ TO <user_id>;
GRANT SELECT ON SYS.TYPE$ TO <user_id>;
GRANT SELECT ON SYS.USER$ TO <user_id>;
GRANT SELECT ON SYS.V_$ARCHIVED_LOG TO <user_id>;
GRANT SELECT ON SYS.V_$ARCHIVE_DEST TO <user_id>;
GRANT SELECT ON SYS.V_$DATABASE TO <user_id>;
GRANT SELECT ON SYS.V_$DATABASE_INCARNATION TO <user_id>;
GRANT SELECT ON SYS.V_$INSTANCE TO <user_id>;
GRANT SELECT ON SYS.V_$LOG TO <user_id>;
GRANT SELECT ON SYS.V_$LOGFILE TO <user_id>;
GRANT SELECT ON SYS.V_$LOGMNR_CONTENTS TO <user_id>;
GRANT SELECT ON SYS.V_$LOGMNR_LOGS TO <user_id>;
GRANT SELECT ON SYS.V_$PARAMETER TO <user_id>;
GRANT SELECT ON SYS.V_$TRANSACTION TO <user_id>;
Oracle Table Level
We also enable supplemental logging for table level with primary key and unique index and please run the scripts located in the oracle_init_example.sql file on your Oracle client to set this up.
Data Provisioning Monitoring
We need to grant sap.hana.im.dp.monitor.roles::Monitoring to the user who will perform monitoring tasks.
Subscription Monitor
<host name>:80<2 digit instance number>/sap/hana/im/dp/monitor/?view=DPSubscriptionMonitor
Replication Task Monitor
<host name>:80<2 digit instance number>/sap/hana/im/dp/monitor/?view=IMTaskMonitor
DP Agent Monitor
<host name>:80<2 digit instance number>/sap/hana/im/dp/monitor/?view=DPAgentMonitor
Access External Resource
User Provided Service
In order to access the remote source created in previous step inside container, we have to create a user provided service to grant access to the container. Basically anything your container need to access has to be granted explicitly in advance. A hdbgrantor file will also be created in Web IDE so object owner and application user will have permission to access the granted resource. They DON’T HAVE access at default. After this the remote source behaves like local object to container.
Inside your MTA project, you can add reference directly at database module. Right click db module and choose New -> Add SAP HANA Service Connection
Select Non HDI Container since we are adding reference to external system resource not HDI container
For Host information, it is optional to provide since you can add reference to project space mapping tenant database only. You can leave it empty or just localhost. For simplicity I used tenant system administrator to provision Oracle remote source but in real implementation you should create a dedicated users for development and operation support.
Alternatively you can create a user provided service using the either XS CLI or XS Advanced Cockpit
xs cups grant-service -p "{\"host\":\"<hostname>\",\"port\":\"3<Instance Number><15|13>\",\"user\":\"<user>\",\"password\":\"<Password>\",\"driver\":\"com.sap.db.jdbc.Driver\",\"tags\":[\"hana\"] , \"schema\" : \"PLAIN\" }"
In case you create user provided service not from SAP HANA Service Connection, you have to open the project mta.yaml file to add resource manually , then attach it as dependency to db module.
Add grant-service resource
Go to module tab and attach grant-service resource as dependency to db module
You also need to create a hdbgrants file which grant access privileges for the remote source to the object owner and application user of container schema.
Create a Replication Task
To replicate data from objects in a remote source into tables in SAP HANA, you must configure the replication process by creating an .hdbreptask file. After the .hdbreptask has been configured, activate it to generate a stored procedure, a remote subscription, one or more virtual tables for objects that you want to replicate, and target tables. Initial Load Only doesn’t need remote subscription otherwise the remote subscription will always be created. When the stored procedure is called, an initial load is run. When realtime is enabled, then subsequent changes are automatically distributed.
Any change to source data during the initial load will be captured in initial load and after initial load the changes will be replicated in realtime.
DDL changes to source tables that are associated with a replication task will be propagated to SAP HANA if you choose replication with structure so that the same changes will be applied to the SAP HANA target tables.
Note: The Replication Editor is available only starting from SAP Web IDE for SAP HANA with XS Advanced Feature Revision 1.
Enable SAP Web IDE Extensions for SDI
Open the Preferences perspective, in the left sidebar, choose (Preferences).Under Workspace Preferences, choose Extensions.In the Extensions pane to the right, select the toggle button for the extension you want to enable.Choose Save.Refresh your browser.
Add Database Module and Replication Task
It is pretty straightforward to add database module into MTA project
Develop Replication Task
Navigate to the src folder inside your added database module. Choose File New Replication Task. For best practice we create a folder with name repl to hold all replication tasks, and then add a replication task inside this folder
Enter a unique name for the task, and then click Create. Click Connect to a remote source to choose a remote source object.
Select one of the available remote sources, and then click OK.
Now we need to click Add Object to add objects from source database
Click Dictionary Search. Dictionaries can be created on some remote sources so that you can more easily find the objects you want to replicate. You can execute the following statement by building HANA dictionary tables that contain remote source objects.
ALTER REMOVE SOURCE REMOTE_SOURCE_NAME REFRESH OBJECTS
Navigate to the object level and select the remote source objects that you want to replicate, and then click OK.
Option | Description |
Initial load only | Performs a one-time data load without any real-time replication. Always available. |
Initial + Realtime | Performs the initial data load and enables real-time replication. Available when CDC is supported, for tables and virtual tables. |
Realtime | Enables real-time replication without performing an initial data load. Available when CDC is supported, for tables and virtual tables. |
No data transfer | Replicates only the object structure without transferring any data. Always available. |
Initial + realtime with structure | Performs the initial data load, enables real-time replication, and tracks object-level changes. Available when CDC is supported and for tables. |
Realtime only with structure | Enables real-time replication and tracks object-level changes without performing an initial data load. Available when CDC is supported and for tables. |
Click OK to close the Select Remote Objects dialog. The following information is included in the table.
You can sort, filter, or choose to show or hide the column names by clicking any column heading.
he replication task is ready to run at this point. Before running the task, you can edit one or more replication objects by selecting the objects and making changes at the bottom of the screen.
You can also partition, create filters, define target options such as truncating or dropping the target table or changing the load behavior. See more topics in this section for more information.
Target Properties for replicate table
Note: Check on Drop Target Table and Truncate Table option will drop or truncate target table.
Partition Data in a Replication Task
Partitioning data can be helpful when you are initially loading a large data set, because it can improve performance and assist in managing memory usage. Please check further details at Partition Data in a Replication Task
Create an Expression
Use the Expression Editor to create an expression to enhance or filter the data during replication.
Select a remote object.
Click Edit Details.
Click Projection.To create a column mapping expression, choose Target Table Columns to edit or add a column.
Click in the Mapping column to create the expression.To filter the replication data, click Filter Target Table Rows.
Select the columns that you want to use in your expression. You can drag and drop the column names from the list and place it in the Enter filter expression box.Click or type any operators to complete the expression.Save the replication task.
Edit, Add, or Delete Target Columns
Select a remote object. Click Edit Details.
Click Projection Target Table Columns.
Click Add Column .
Enter the column name, select the data type, choose whether you want this column to be the primary key or to allow it to contain a null value. Then create an expression for the mapped value.
Click Validate Syntax to check for errors in the expression, and then click Apply.
Save the replication task
Validate your expression before save
Build Database Module
Choose Build / Build Selected Files.
Virtual table(s): Generated in the specified virtual table schema. You can display the contents of the virtual table in SAP HANA studio.
Remote subscription(s): Generated in the schema selected for the virtual table. This is only generated when the Initial load only option is not selected.
Task(s): Generated in the same schema as the target table.
Target table(s): Populated with the content after execution.
Procedure: Generated in the schema of the target table, the procedure performs three functions.
Execute a Replication Task
Sets the remote subscription to the Queue status. Calls Start Task to perform the initial load of the data. Sets the remote subscription to the Distribute status. Any changes,additions or deletions made to the source data during the initial load are updated in the target system. Any changes to the source data thereafter are updated real time to the target.To begin replicating data, you need to open the container first and find those replication procedures
Click Procedures to display all available replication procedures
Right click the procedure and select Generate Call Statement
Click Run to execute replication task
The replication task begins. To monitor the status, select the replication task name in the workspace list and choose Run Launch Tasks Overview. The Database Explorer shows the tasks that have been processed in the container.
Measuring Latency
You can measure trends by issuing the following statements at HANA server
ALTER REMOTE SOURCE <remote_source_name> START LATENCY MONITORING <latency_ticket_name> [INTERVAL <interval_in_seconds>]
Starts the collection of latency statistics one time or at regular intervals. The user specifies a target latency ticket in the monitoring view.
Let’s take a quick look at result
SELECT * From “SYS”.”M_REMOTE_SOURCE_LATENCY_HISTORY”
The sequence of components from end to end is
SDB (source database) > LRI (LogReader input) > LRO (LogReader output) > SNDR (sender) > FRAMEWORK > RECEIVER > DISTRIBUTOR > APPLIER.
So you can take a look at statistics and find the where is bottleneck for your replication performance
Data Recovery
After a replication task has failed, you can use the lost data tracker in the Data Provisioning Agent command-line configuration tool to identify and correct data inconsistencies that may have occurred.
No comments:
Post a Comment