Overview SAP Smart Data Integration provides real time, high speed data provisioning, bulk data movement, and federation. Replication task or flowgraph lets you set up batch or real-time data replication and transformation scenarios in an easy-to-use web application. Data Provisioning Agent hosts data provisioning adapters, enables data federation, replication, and transformation scenarios for on-premise or in-cloud deployments. Monitors for Data Provisioning Agents, remote subscriptions, and data loads are accessible from the SAP HANA cockpit. Architecture In all deployments, the basic components are the same. However, the connections between the components may differ depending on whether SAP HANA is deployed on premise, in the cloud, or behind a firewall. Based on our current SAP HANA landscape we use the following deployment options for SDI. ◉ System 1: Data Provisioning Server (internal component of SAP HANA Server) ◉ System 2: Data Provisioning Agent ◉ System 3: Source system Yo may notice Data Provisioning Server within SAP HANA connects to the agent using the TCP/IP protocol and default port is 5050 Enable the Data Provisioning Server You need to enable the data provisioning server in SAP HANA server first and if you have a multi-database container scenario, you need to execute the ALTER DATABASE statement in the system database ALTER DATABASE ADD 'dpserver' AT LOCATION '[:]' Open SAP HANA Administration for SYSTEMDB and switch to Configuration tab. Expand the daemon.ini and you can find dpserver.HD2 (your tenant database). The data provisioning server has been successfully enabled for your tenant database If you want to remove the data provisioning server in a multi-database container scenario you also need to execute the ALTER DATABASE statement in the system database ALTER DATABASE REMOVE 'dpserver' AT LOCATION '[:]' Download the DP Agent & Delivery Unit For POC we use SAP HANA Server 2.00.037 and need to download corresponding version of DP Agent and Delivery Unit. For details, see the SAP HANA smart data integration Product Availability Matrix (PAM). On the SAP Software Download Center, you can find the installation packages in the following locations: DP Agent SAP Software Download Center > Software Downloads > Support Packages & Patches > By Alphabetical Index (A-Z) > H > SAP HANA SDI > SAP HANA SDI > Comprised Software Component Versions > HANA DP AGENT DP Delivery Unit SAP Software Download Center > Software Downloads > Installations & Upgrades > By Alphabetical Index (A-Z) H SAP HANA SDI SAP HANA SDI 2.0 > COMPRISED SOFTWARE COMPONENT VERSIONS > HANA DP 2.0 > Click the ZIP file that you need, and save it to your preferred location > In the HANAIMDP.ZIP file, find and extract the HANA_IM_DP.tgz file Import DP Delivery Unit For multi-tenant database container configuration, you must import the delivery unit into the tenant database that you have enabled DP Server in previous step For simplicity of security I will use SYSTEM user to import from SAP HANA Studio In the upper left corner, click File Import. On the Import dialog, type delivery into the search box for Select an import source Click Delivery Unit on the resulting navigation tree and click Next. Then select your target system, and click Next On the Import Through Delivery Unit dialog, select Client radio and click Browse and navigate to the location where you downloaded the delivery unit (please extract first), select HANAIMDP.tgz, and click Finish. Install the DP Agent System Requirements ◉ Windows (64bit) 2016 Service Pack 03 ◉ Instance type: AWS m5.2xlarge (32GB) ◉ AWS Instance name: xxxxxx ◉ Access privilege: service account with local administrators group ◉ Firewall port: all open (for simplicity) Software Requirements ◓ Data Provisioning Agent: HANA DP AGENT ◓ Oracle Database Client (12.1.0.2.0) for Microsoft Windows (x64) : Install at C:\Oracle ◓ Oracle TNS file: You can find it at C:\Oracle\Client12102\network\admin\tnsnames.ora ◓ SAP JVM: sapjvm-7.1.066-windows-x64 (https://tools.hana.ondemand.com/#cloud) ◓ Microsoft Visual C++ 2010 Redistributable Package : https://www.microsoft.com/en-us/download/confirmation.aspx?id=14632 ◓ Supporting Libraries which are required based on Product Availability Matrix (PAM) ◓ Find the required supporting libraries inside Oracle client folder ◓ ojdbc7.jar: C:\Oracle\Client12102\jdbc\lib ◓ xdb6.jar: C:\Oracle\Client12102\RDBMS\jlib ◓ xmlparserv2.jar: C:\Oracle\Client12102\LIB Note: You must use the Administrator user or a user in the administrators group to install and configure DP Agent. Installation Steps: 1. Create a temp folder C:\Install and copy the agent installation file into and extract there. Go inside and find the hdbsetup.exe and right click to run as administrator. 2. Choose install new SAP HANA Data Provisioning Agent and specify the installation path. I left installation with default setting. 3. The user that runs the agent service must have read/write access to the installation directory so that configuration files can be updated. Default Installation Paths as below Configure DP Agent You must configure the Data Provisioning Agent before you can use adapters to connect to data sources, create remote sources, and so on. Start the configuration tool with run as administrator C:\usr\sap\dataprovagent\configTool\dpagentconfigtool.exe Connect to the SAP HANA serverSpecify the host name, port, and HANA Agent Admin User credentials for the SAP HANA server Register DP Agent to HANA server Start DP Agent by clicking Start Agent Button.Register OracleLogReaderAdapter to HANA server Data Provisioning Agent Tuning: For large initial loads, open the DP Agent configuration file C:\usr\sap\dataprovagent\dpagent.ini, configure for 24GB of memory at -Xmx24576m Disable SSL for Agent to HANA communication on TCP if you don’t have SSL in place. Beyond POC environment, We recommend to install SSL certificate to encrypt the communication Lastly we need to copy over those supporting libraries into DP Agent library folder C:\usr\sap\dataprovagent\lib Create a Remote Source Oracle Database: ◓ User Name – xxxxxx ◓ Password – xxxxxx ◓ Database Service Name – SCUAT ◓ Host name – xxxxxx To create a remote source we need to provide the following 1. Remote Source Name: ORCL_UAT 2. Adapter Name: OracleLogReaderAdapter 3. Source Location: agent(dpagentorcl) Also provide the following properties for remote source and some of them are for performance optimization Lastly, provide Oracle Connection Credential (Technical user) Test connection to see if it is OK Oracle Database Permissions Oracle database users need to have certain permissions granted to them in order to carry out real-time change data capture or batch or initial load transactions. SDI supports both database-level supplemental logging and table-level supplemental logging. We decide to go table-level supplemental logging. Set your logging level in the Adapter Preferences window of the Data Provisioning Agent configuration tool for the Oracle Log Reader adapter. Then, run the necessary scripts found in the oracle_init_example.sql file, located in \LogReader\Scripts. Here is our table-level supplemental logging -- Database Level ALTER DATABASE ADD SUPPLEMENTAL LOG DATA; ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (PRIMARY KEY) COLUMNS; ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (UNIQUE) COLUMNS;Database User Level -- Create database user CREATE USER IDENTIFIED BY DEFAULT TABLESPACE USERS TEMPORARY TABLESPACE TEMP PROFILE DEFAULT ACCOUNT UNLOCK; -- Grant two roles for database user GRANT EXECUTE_CATALOG_ROLE TO ; GRANT SELECT_CATALOG_ROLE TO ; ALTER USER DEFAULT ROLE ALL; -- Grant 11 system privileges for database user GRANT ALTER ANY TABLE TO ; GRANT CREATE PROCEDURE TO ; GRANT CREATE SEQUENCE TO ; GRANT CREATE SESSION TO ; GRANT CREATE TABLE TO ; GRANT CREATE TRIGGER TO ; GRANT CREATE VIEW TO ; GRANT LOGMINING TO ; GRANT SELECT ANY TABLE TO ; GRANT SELECT ANY TRANSACTION TO ; GRANT UNLIMITED TABLESPACE TO ; -- Grant 53 object privileges for database user GRANT SELECT ON SYS.ATTRIBUTE$ TO ; GRANT SELECT ON SYS.CCOL$ TO ; GRANT SELECT ON SYS.CDEF$ TO ; GRANT SELECT ON SYS.COL$ TO ; GRANT SELECT ON SYS.COLLECTION$ TO ; GRANT SELECT ON SYS.COLTYPE$ TO ; GRANT SELECT ON SYS.CON$ TO ; GRANT SELECT ON SYS.DBA_ERRORS TO ; GRANT SELECT ON SYS.DBA_LIBRARIES TO ; GRANT SELECT ON SYS.DBA_LOG_GROUPS TO ; GRANT SELECT ON SYS.DBA_OBJECTS TO ; GRANT SELECT ON SYS.DBA_SYNONYMS TO ; GRANT SELECT ON SYS.DBA_TABLES TO ; GRANT SELECT ON SYS.DBA_TRIGGERS TO ; GRANT EXECUTE ON SYS.DBMS_LOGMNR TO ; GRANT EXECUTE ON SYS.DBMS_LOGMNR_D TO ; GRANT SELECT ON SYS.DEFERRED_STG$ TO ; GRANT SELECT ON SYS.GV_$INSTANCE TO ; GRANT SELECT ON SYS.GV_$SESSION TO ; GRANT SELECT ON SYS.ICOL$ TO ; GRANT SELECT ON SYS.IND$ TO ; GRANT SELECT ON SYS.INDCOMPART$ TO ; GRANT SELECT ON SYS.INDPART$ TO ; GRANT SELECT ON SYS.INDSUBPART$ TO ; GRANT SELECT ON SYS.LOB$ TO ; GRANT SELECT ON SYS.LOBCOMPPART$ TO ; GRANT SELECT ON SYS.LOBFRAG$ TO ; GRANT SELECT ON SYS.MLOG$ TO ; GRANT SELECT ON SYS.NTAB$ TO ; GRANT SELECT ON SYS.OBJ$ TO ; GRANT SELECT ON SYS.OPQTYPE$ TO ; GRANT SELECT ON SYS.PARTOBJ$ TO ; GRANT SELECT ON SYS.SEG$ TO ; GRANT SELECT ON SYS.SEQ$ TO ; GRANT SELECT ON SYS.SNAP$ TO ; GRANT SELECT ON SYS.TAB$ TO ; GRANT SELECT ON SYS.TABCOMPART$ TO ; GRANT SELECT ON SYS.TABPART$ TO ; GRANT SELECT ON SYS.TABSUBPART$ TO ; GRANT SELECT ON SYS.TS$ TO ; GRANT SELECT ON SYS.TYPE$ TO ; GRANT SELECT ON SYS.USER$ TO ; GRANT SELECT ON SYS.V_$ARCHIVED_LOG TO ; GRANT SELECT ON SYS.V_$ARCHIVE_DEST TO ; GRANT SELECT ON SYS.V_$DATABASE TO ; GRANT SELECT ON SYS.V_$DATABASE_INCARNATION TO ; GRANT SELECT ON SYS.V_$INSTANCE TO ; GRANT SELECT ON SYS.V_$LOG TO ; GRANT SELECT ON SYS.V_$LOGFILE TO ; GRANT SELECT ON SYS.V_$LOGMNR_CONTENTS TO ; GRANT SELECT ON SYS.V_$LOGMNR_LOGS TO ; GRANT SELECT ON SYS.V_$PARAMETER TO ; GRANT SELECT ON SYS.V_$TRANSACTION TO ; Oracle Table Level We also enable supplemental logging for table level with primary key and unique index and please run the scripts located in the oracle_init_example.sql file on your Oracle client to set this up. Data Provisioning Monitoring We need to grant sap.hana.im.dp.monitor.roles::Monitoring to the user who will perform monitoring tasks. Subscription Monitor :80<2 digit instance number>/sap/hana/im/dp/monitor/?view=DPSubscriptionMonitor Replication Task Monitor :80<2 digit instance number>/sap/hana/im/dp/monitor/?view=IMTaskMonitor DP Agent Monitor :80<2 digit instance number>/sap/hana/im/dp/monitor/?view=DPAgentMonitor Access External Resource User Provided Service In order to access the remote source created in previous step inside container, we have to create a user provided service to grant access to the container. Basically anything your container need to access has to be granted explicitly in advance. A hdbgrantor file will also be created in Web IDE so object owner and application user will have permission to access the granted resource. They DON’T HAVE access at default. After this the remote source behaves like local object to container. Inside your MTA project, you can add reference directly at database module. Right click db module and choose New -> Add SAP HANA Service Connection Select Non HDI Container since we are adding reference to external system resource not HDI container For Host information, it is optional to provide since you can add reference to project space mapping tenant database only. You can leave it empty or just localhost. For simplicity I used tenant system administrator to provision Oracle remote source but in real implementation you should create a dedicated users for development and operation support. Alternatively you can create a user provided service using the either XS CLI or XS Advanced Cockpit xs cups grant-service -p "{"host":"","port":"3<15|13>","user":"","password":"","driver":"com.sap.db.jdbc.Driver","tags":["hana"] , "schema" : "PLAIN" }" In case you create user provided service not from SAP HANA Service Connection, you have to open the project mta.yaml file to add resource manually , then attach it as dependency to db module. Add grant-service resource Go to module tab and attach grant-service resource as dependency to db module You also need to create a hdbgrants file which grant access privileges for the remote source to the object owner and application user of container schema. Create a Replication Task To replicate data from objects in a remote source into tables in SAP HANA, you must configure the replication process by creating an .hdbreptask file. After the .hdbreptask has been configured, activate it to generate a stored procedure, a remote subscription, one or more virtual tables for objects that you want to replicate, and target tables. Initial Load Only doesn’t need remote subscription otherwise the remote subscription will always be created. When the stored procedure is called, an initial load is run. When realtime is enabled, then subsequent changes are automatically distributed. Any change to source data during the initial load will be captured in initial load and after initial load the changes will be replicated in realtime. DDL changes to source tables that are associated with a replication task will be propagated to SAP HANA if you choose replication with structure so that the same changes will be applied to the SAP HANA target tables. Note: The Replication Editor is available only starting from SAP Web IDE for SAP HANA with XS Advanced Feature Revision 1. Enable SAP Web IDE Extensions for SDI Open the Preferences perspective, in the left sidebar, choose (Preferences).Under Workspace Preferences, choose Extensions.In the Extensions pane to the right, select the toggle button for the extension you want to enable.Choose Save.Refresh your browser. Add Database Module and Replication Task It is pretty straightforward to add database module into MTA project Develop Replication Task Navigate to the src folder inside your added database module. Choose File New Replication Task. For best practice we create a folder with name repl to hold all replication tasks, and then add a replication task inside this folder Enter a unique name for the task, and then click Create. Click Connect to a remote source to choose a remote source object. Select one of the available remote sources, and then click OK. Now we need to click Add Object to add objects from source database Click Dictionary Search. Dictionaries can be created on some remote sources so that you can more easily find the objects you want to replicate. You can execute the following statement by building HANA dictionary tables that contain remote source objects. ALTER REMOVE SOURCE REMOTE_SOURCE_NAME REFRESH OBJECTS Navigate to the object level and select the remote source objects that you want to replicate, and then click OK. Option Description Initial load only Performs a one-time data load without any real-time replication. Always available. Initial + Realtime Performs the initial data load and enables real-time replication. Available when CDC is supported, for tables and virtual tables. Realtime Enables real-time replication without performing an initial data load. Available when CDC is supported, for tables and virtual tables. No data transfer Replicates only the object structure without transferring any data. Always available. Initial + realtime with structure Performs the initial data load, enables real-time replication, and tracks object-level changes. Available when CDC is supported and for tables. Realtime only with structure Enables real-time replication and tracks object-level changes without performing an initial data load. Available when CDC is supported and for tables. Click OK to close the Select Remote Objects dialog. The following information is included in the table. You can sort, filter, or choose to show or hide the column names by clicking any column heading. he replication task is ready to run at this point. Before running the task, you can edit one or more replication objects by selecting the objects and making changes at the bottom of the screen. You can also partition, create filters, define target options such as truncating or dropping the target table or changing the load behavior. See more topics in this section for more information. Target Properties for replicate table Note: Check on Drop Target Table and Truncate Table option will drop or truncate target table. Partition Data in a Replication Task Partitioning data can be helpful when you are initially loading a large data set, because it can improve performance and assist in managing memory usage. Please check further details at Partition Data in a Replication Task Create an Expression Use the Expression Editor to create an expression to enhance or filter the data during replication. Select a remote object. Click Edit Details. Click Projection.To create a column mapping expression, choose Target Table Columns to edit or add a column. Click in the Mapping column to create the expression.To filter the replication data, click Filter Target Table Rows. Select the columns that you want to use in your expression. You can drag and drop the column names from the list and place it in the Enter filter expression box.Click or type any operators to complete the expression.Save the replication task. Edit, Add, or Delete Target Columns Select a remote object. Click Edit Details. Click Projection Target Table Columns. Click Add Column . Enter the column name, select the data type, choose whether you want this column to be the primary key or to allow it to contain a null value. Then create an expression for the mapped value. Click Validate Syntax to check for errors in the expression, and then click Apply. Save the replication task Validate your expression before save Build Database Module Choose Build / Build Selected Files. Virtual table(s): Generated in the specified virtual table schema. You can display the contents of the virtual table in SAP HANA studio. Remote subscription(s): Generated in the schema selected for the virtual table. This is only generated when the Initial load only option is not selected. Task(s): Generated in the same schema as the target table. Target table(s): Populated with the content after execution. Procedure: Generated in the schema of the target table, the procedure performs three functions. Execute a Replication Task Sets the remote subscription to the Queue status. Calls Start Task to perform the initial load of the data. Sets the remote subscription to the Distribute status. Any changes,additions or deletions made to the source data during the initial load are updated in the target system. Any changes to the source data thereafter are updated real time to the target.To begin replicating data, you need to open the container first and find those replication procedures Click Procedures to display all available replication procedures Right click the procedure and select Generate Call Statement Click Run to execute replication task The replication task begins. To monitor the status, select the replication task name in the workspace list and choose Run Launch Tasks Overview. The Database Explorer shows the tasks that have been processed in the container. Measuring Latency You can measure trends by issuing the following statements at HANA server ALTER REMOTE SOURCE START LATENCY MONITORING [INTERVAL ] Starts the collection of latency statistics one time or at regular intervals. The user specifies a target latency ticket in the monitoring view. Let’s take a quick look at result SELECT * From “SYS”.”M_REMOTE_SOURCE_LATENCY_HISTORY” The sequence of components from end to end is SDB (source database) > LRI (LogReader input) > LRO (LogReader output) > SNDR (sender) > FRAMEWORK > RECEIVER > DISTRIBUTOR > APPLIER. So you can take a look at statistics and find the where is bottleneck for your replication performance Data Recovery After a replication task has failed, you can use the lost data tracker in the Data Provisioning Agent command-line configuration tool to identify and correct data inconsistencies that may have occurred.