Wednesday, 10 April 2019

SDI SDQ Geocoding & Address Cleansing

Have you tried using the HANA Smart Data Quality (SDQ), and found the transforms aren’t working?  Well this blog should help, there are country specific files required for both geocoding (latitude, longitude) and address cleansing.

The steps covered below are

1. Download the files
2. Extract the zip file
3. Verify existing configuration
4. Update Configuration
5. Validate Configuration
6. Build SDI Flowgraph
7. Errors

1. Download the files


Locate the required files from SAP Software Downloads.

Addresses and Geocoding are both within the “Addreess Directories & Reference Data” category, and data is held a country level, I am using some UK data so have downloaded the UK GEO DIRECTORY, this will be needed for the geocode SDQ transform.

SAP HANA Tutorial and Material, SAP HANA Guides, SAP HANA Certifications, SAP HANA Study Materials

You usually want the latest file available
SAP HANA Tutorial and Material, SAP HANA Guides, SAP HANA Certifications, SAP HANA Study Materials

I download these directly onto the HANA box as they can be quite large.

SAP HANA Tutorial and Material, SAP HANA Guides, SAP HANA Certifications, SAP HANA Study Materials

2. Extract the zip file


The official way to install these file is using the hdblcm tool, but you can also do this manually.  Which can be handy if your hdblcm is not working.  The files need to be owned by the HANA SID ADM user (ih2adm) in my case.

SAP HANA Tutorial and Material, SAP HANA Guides, SAP HANA Certifications, SAP HANA Study Materials

3. Verify existing configuration


We can check to see if there’s an existing configuration of the SDQ reference data files

select * 
from SYS.M_INIFILE_CONTENTS 
where FILE_NAME = 'scriptserver.ini' 
and SECTION = 'adapter_framework' 
and KEY = 'dq_reference_data_path';

This should not return any rows.

SAP HANA Tutorial and Material, SAP HANA Guides, SAP HANA Certifications, SAP HANA Study Materials

The same is true in HANA Studio, Administration -> Configuration

SAP HANA Tutorial and Material, SAP HANA Guides, SAP HANA Certifications, SAP HANA Study Materials

4. Update Configuration (dq_reference_data_path)


We can go ahead and update our configuration as below, but replacing with your directory path.

-- EXECUTE ON SYSTEMDB
ALTER SYSTEM ALTER CONFIGURATION ('scriptserver.ini', 'SYSTEM') 
SET ('adapter_framework', 'dq_reference_data_path') = '/hana/SDQ_Reference/' WITH RECONFIGURE

After doing this change we need to restart the ScriptServer.

SAP HANA Tutorial and Material, SAP HANA Guides, SAP HANA Certifications, SAP HANA Study Materials

5. Validate Configuration


Repeating step 3 we should now see our new configuration in place.

SAP HANA Tutorial and Material, SAP HANA Guides, SAP HANA Certifications, SAP HANA Study Materials

SAP HANA Tutorial and Material, SAP HANA Guides, SAP HANA Certifications, SAP HANA Study Materials

6. Build SDI Flowgraph


We can now switch to the WebIDE and create an SDI (Smart Data Integration) Flowgraph.

Here I have just created a dummy .hdbview that doesn’t contain any real personal data.

SAP HANA Tutorial and Material, SAP HANA Guides, SAP HANA Certifications, SAP HANA Study Materials

We can then use this in a simple flowgraph as below

SAP HANA Tutorial and Material, SAP HANA Guides, SAP HANA Certifications, SAP HANA Study Materials

DataSource

SAP HANA Tutorial and Material, SAP HANA Guides, SAP HANA Certifications, SAP HANA Study Materials

Geocode

SAP HANA Tutorial and Material, SAP HANA Guides, SAP HANA Certifications, SAP HANA Study Materials

Here we can see the SDQ has automatically guessed correctly the Content Types, this is from the data that is being fed in.

SAP HANA Tutorial and Material, SAP HANA Guides, SAP HANA Certifications, SAP HANA Study Materials

It makes sense to add our input data as output columns

SAP HANA Tutorial and Material, SAP HANA Guides, SAP HANA Certifications, SAP HANA Study Materials

We can now save, build and Execute the Flowgraph

SAP HANA Tutorial and Material, SAP HANA Guides, SAP HANA Certifications, SAP HANA Study Materials

Before we celebrate the sucessful execution, we should check our target table.

SAP HANA Tutorial and Material, SAP HANA Guides, SAP HANA Certifications, SAP HANA Study Materials

All appears to be good, for one final verification we can put the LAT, LONG in to our favourite mapping tool and check the location looks correct.

SAP HANA Tutorial and Material, SAP HANA Guides, SAP HANA Certifications, SAP HANA Study Materials

Errors


If you receive this error below then you address and/or geocode reference data is likely missing.

15:57:12 (/Sample/db) Execution Failed : (dberror) [686]: start task error: “SAMPLE”.”UK_GEOCODE_SP”: line 5 col 0 (at pos 85): [2620] executor: plan operation failed;Execution of Adapter operation within node GeocodeUK_GEOCODE_TEMPLATE_ failed: exception 141005: Failed to create or initialize HanaTransform object

Looking at the diagnosis files in HANA Studio will give you more information.  Check the scriptserver_alert_<hostname>.trc

Here it appears as though I am out of memory, but this is actually caused by a permissions issue.  Make sure the geo or address files are owned by you HANA <SID>ADM OS user.

[65085]{200563}[2/66728980] 2019-04-04 16:08:16.664458 e af_core hanaLogging.cpp(00198) : <GEO0018>: GEO0018GEO_ERROR_DIR_OUT_OF_MEMORY: Out of memory during directory initialization: hana/SDQ_Reference/geo_gb_nt.dir.
[65085]{200563}[2/66728980] 2019-04-04 16:08:16.740559 e af_core hanaTransform.cpp(01240) : GeocodeUK_GEOCODE_TEMPLATE_:1: In HanaTransform, Failed to create transform
[65085]{200563}[2/66728980] 2019-04-04 16:08:16.742133 e CalcEngine cePopAdapter.cpp(00338) : Execution of Adapter operation within node GeocodeUK_GEOCODE_TEMPLATE_ failed: exception 141005: Failed to create or initialize HanaTransform object

1 comment: