Sunday, 25 November 2018

Merge Multiple SCD Tables into A Single Dimension

Tracking historical changes within a dimension is a common task in data warehousing.  Achieving this in a HANA sidecar database (SCD) requires an explicit design focus.  In some cases, the data source system maintains dates and time stamps to denote a record or value’s creation, change, or validity threshold.  In SAP systems creation, change, or validity dates are often but not always maintained for a record, dimension, attribute, or value. One such example being SAP Human Capital Management (HCM) mini-master info-types, seen as below:

SAP HANA Certification, SAP HANA Learning, SAP HANA Tutorial and Materials

The above info-types are versioned using a Begin (BEGDA) Date and End (ENDDA) Date thereby creating a validity range.  In this post, I am demonstrating a simplified scenario using the following three tables, each with multiple valid from/to timestamps to show how to merge and aggregate the data into an ordered historical result using SAP HANA calculation views.

Execute the below SQL script then preview the resulting tables and data.

CREATE COLUMN TABLE "STAGING"."PA0000"
(
"PERNR" NVARCHAR(8),
"BEGDA" NVARCHAR(8),
"ENDDA" NVARCHAR(8),
"STAT" NVARCHAR(1)
);


CREATE COLUMN TABLE "STAGING"."PA0001"
(
"PERNR" NVARCHAR(8),
"BEGDA" NVARCHAR(8),
"ENDDA" NVARCHAR(8),
"STAT1" NVARCHAR(1)
);


CREATE COLUMN TABLE "STAGING"."PA0007"
(
"PERNR" NVARCHAR(8),
"BEGDA" NVARCHAR(8),
"ENDDA" NVARCHAR(8),
"STAT7" NVARCHAR(1)
);


INSERT INTO "STAGING"."PA0000" VALUES('03151228','20170801','20180116','A');
INSERT INTO "STAGING"."PA0000" VALUES('03151228','20180116','20180502','B');
INSERT INTO "STAGING"."PA0000" VALUES('03151228','20180502','99991231','C');

INSERT INTO "STAGING"."PA0001" VALUES('03151228','20180501','20180829','D');
INSERT INTO "STAGING"."PA0001" VALUES('03151228','20180829','99991231','E');

INSERT INTO "STAGING"."PA0007" VALUES('03151228','20180301','20180511','F');
INSERT INTO "STAGING"."PA0007" VALUES('03151228','20180511','20181016','G');
INSERT INTO "STAGING"."PA0007" VALUES('03151228','20181016','99991231','H');

Each table contains rows for one Personnel ID (PERNR) with changing attribute stat (STAT,STAT1,STAT7) values.  The changes are denoted by Valid-From (BEGDA) and Valid-To (ENDDA) date ranges.  In this example, we will apply an inclusion of BEGDA and exclusion of ENDA.  Each table combinations of PERNR, BEGDA, and ENDDA to create a unique set of keys for the row.

Table: PA0000

SAP HANA Certification, SAP HANA Learning, SAP HANA Tutorial and Materials

Table: PA0001

SAP HANA Certification, SAP HANA Learning, SAP HANA Tutorial and Materials

Table: PA0007

SAP HANA Certification, SAP HANA Learning, SAP HANA Tutorial and Materials

Here is the final result after merging three tables:

SAP HANA Certification, SAP HANA Learning, SAP HANA Tutorial and Materials

The information of the three tables is now merged into a single dimension view. Each row represents a version change of attributes with different BEG_DATE and END_DATE. As an example, the first change for employee 03151228 was starting date 20180116 with the STAT value changed from A to B.

So, how do we get there? There are two steps we define to accomplish this.

Task 1 – Collect all dates then build new valid from/to date ranges using the SAP HANA Rank Node.

SAP HANA Certification, SAP HANA Learning, SAP HANA Tutorial and Materials

1) Collect dates in three PA tables. The BEGDA and ENDDA columns are renamed to DATES

Projection Node: PR_PA0000_BEG

SAP HANA Certification, SAP HANA Learning, SAP HANA Tutorial and Materials

Projection Node: PR_PA0000_END

SAP HANA Certification, SAP HANA Learning, SAP HANA Tutorial and Materials

2) Union all dates using Union Node

SAP HANA Certification, SAP HANA Learning, SAP HANA Tutorial and Materials

The result after union (all) is as follows.  There are some duplicate dates such as 20180116, 99991231 etc.

SAP HANA Certification, SAP HANA Learning, SAP HANA Tutorial and Materials

3) Remove duplicate dates by applying an Aggregation Node. The dates will now be distinct.

4) To partition PERNR, using an ascending sort we generate a RANK ID for each time slice (assume each PERNR will have maximum 100,000 time slices)

SAP HANA Certification, SAP HANA Learning, SAP HANA Tutorial and Materials

Data preview for RANK_DATES

SAP HANA Certification, SAP HANA Learning, SAP HANA Tutorial and Materials

5) Derive a column LEAD_RANK_ID using below formula

SAP HANA Certification, SAP HANA Learning, SAP HANA Tutorial and Materials

Next we will split the result into two datasets.  The first contains PERNR, DATES, and LEAD_RANK_ID. The second contains PERNR, DATES, and RANK_ID

1ST dataset 

SAP HANA Certification, SAP HANA Learning, SAP HANA Tutorial and Materials

2nd dataset

SAP HANA Certification, SAP HANA Learning, SAP HANA Tutorial and Materials

6) Apply the following left outer join condition between the first and second dataset. From the 1st dataset, expose DATES and rename it as BEG_DATE. From the 2nd dataset, expose DATES and rename it as END_DATE.

SAP HANA Certification, SAP HANA Learning, SAP HANA Tutorial and Materials

Date preview on JN_BEG_END

SAP HANA Certification, SAP HANA Learning, SAP HANA Tutorial and Materials

7)  Add a filter removing the last line where END_DATE is null.

SAP HANA Certification, SAP HANA Learning, SAP HANA Tutorial and Materials

8) Final result for task 1

SAP HANA Certification, SAP HANA Learning, SAP HANA Tutorial and Materials

Task 2 – We now have eight rows (see above) with correct date ranges but without the original three table attributes. We can add the corresponding attributes by joining the initial three base tables with our new data ranges. An SAP HANA temporal join is not applicable here as we are working with a date range instead of temporal column such as a posting date.   Instead, we can write a SAP HANA SQL Script to apply a date range look up.

SAP HANA Certification, SAP HANA Learning, SAP HANA Tutorial and Materials

Here is a graphical modeling solution with the final calculation view as below

SAP HANA Certification, SAP HANA Learning, SAP HANA Tutorial and Materials

1) Select PA tables etc., PR_PA0000

SAP HANA Certification, SAP HANA Learning, SAP HANA Tutorial and Materials

2) Create a left outer join between the new date ranges and PA tables on PERNR only

Left outer join on PERNR

SAP HANA Certification, SAP HANA Learning, SAP HANA Tutorial and Materials

See the left outer join results below.  The first row is a valid record as BEG_DATE >= PA00_BEGDA and END_DA <= PA00_ENDDA selecting STAT value A as the correct version.  Note that some rows, such as fourth row, become invalid as BEG_DATE and END_DATE range moves out the row’s PA00_BEGDA and PA00_ENDDA date range.

SAP HANA Certification, SAP HANA Learning, SAP HANA Tutorial and Materials

3) Create a calculated attribute to flag invalid rows with a null value

SAP HANA Certification, SAP HANA Learning, SAP HANA Tutorial and Materials

Data preview with calculated attribute

SAP HANA Certification, SAP HANA Learning, SAP HANA Tutorial and Materials

4) Apply an aggregation node to compress result, eliminating invalid rows with null CC_STAT values.

SAP HANA Certification, SAP HANA Learning, SAP HANA Tutorial and Materials

4) Repeat the same steps for PA0001 and PA0007 tables.

SAP HANA Certification, SAP HANA Learning, SAP HANA Tutorial and Materials

Note: In the case where a dimension-based value requires historical perspective but carries no associated create, change or validity dates and/or time, there is the option of deriving the necessary dates during the data warehouse load processes.  Historical attribute views between fact and dimension tables are now possible using SQL or temporal joins.

No comments:

Post a Comment