Sometimes functional requirements for data warehouses (DWH) include support of time-dependency for master data. For example, in the real world, a manager changes their city from time to time. According to DWH data model, city is an attribute of manager. It’s required to see which city relates to manager at a particular date. In this case, city must be the time-dependent attribute of manager
If you are going to build DWH using SAP BW, you may not worry. SAP BW has proven comprehensive support for time-dependent master data out-of-the-box. But what if you need to build SAP HANA SQL Datawarehouse? Does SAP HANA support time-dependent scenarios like in SAP BW?
The answer is Yes. And SAP HANA has much more than SAP BW in the same area.
All samples from the blog were created in SAP HANA 2.0 SP04 Express Edition. I use SAP Web IDE for SAP HANA to create all table definitions (entities), to fill them with sample data and to execute queries. In the blog I consider 2 scenarios:
1. System versioning (SV): HANA registers timestamp (defined by HANA server) of changing data in a HANA table. User has the option to see what data was in the HANA table at any defined timestamp in the past.
2. Application time travelling (ATT): HANA registers moment of time (date or timestamp, defined by application) of changing data in HANA table. User has the option to see what data was changed by application for any moment of time (date or timestamp) in the past.
Let’s start from System Versioning (SV) scenario.
I created hdbcds-file with 2 entities: info [this is primary entity] and info_hist [this is linked entity to store system versions of data from primary entity]. Please, pay attention, that the primary entity
◉ must have primary key;
◉ must have 2 columns of UTCTimestamp data type (not null) to support SV.
Below is manager.hdbcds file
namespace cvodata.db.tt;
context manager {
entity info {
key Id : Integer not null;
surname : String(100);
city : String(51);
revenue : Integer;
TidFrom : LocalDate not null;
TidTo : LocalDate not null;
SysValidFrom : UTCTimestamp not null;
SysValidTo : UTCTimestamp not null;
};
entity info_hist {
Id : Integer not null;
surname : String(100);
city : String(51);
revenue : Integer;
TidFrom : LocalDate not null;
TidTo : LocalDate not null;
SysValidFrom : UTCTimestamp not null;
SysValidTo : UTCTimestamp not null;
};
};
We need to define connection between info and info_hist entities. We use manager.hdbsystemversioning file for it.
SYSTEM VERSIONING "cvodata.db.tt::manager.info"("SysValidFrom", "SysValidTo") HISTORY TABLE "cvodata.db.tt::manager.info_hist" NOT VALIDATED
Code below is for deleting (if any) and inserting sample records
DELETE FROM "CVODATA_HDI_DB"."cvodata.db.tt::manager.info";
DELETE FROM "CVODATA_HDI_DB"."cvodata.db.tt::manager.info_hist";
INSERT INTO "CVODATA_HDI_DB"."cvodata.db.tt::manager.info" VALUES(
1/*Id <INTEGER>*/,
'Smith'/*surname <NVARCHAR(100)>*/,
'Moscow'/*city <NVARCHAR(60)>*/,
100/*revenue <INTEGER>*/,
TO_DATE('2000-01-01','YYYY-MM-DD') /*TidFrom <DATE>*/,
TO_DATE('9999-12-31','YYYY-MM-DD') /*TidTo <DATE>*/
);
INSERT INTO "CVODATA_HDI_DB"."cvodata.db.tt::manager.info" VALUES(
2/*Id <INTEGER>*/,
'Ivanov'/*surname <NVARCHAR(100)>*/,
'Pskov'/*city <NVARCHAR(60)>*/,
10/*revenue <INTEGER>*/,
TO_DATE('2000-01-01','YYYY-MM-DD') /*TidFrom <DATE>*/,
TO_DATE('9999-12-31','YYYY-MM-DD') /*TidTo <DATE>*/
);
SELECT * FROM "CVODATA_HDI_DB"."cvodata.db.tt::manager.info" ORDER BY "Id" ASC
If you are going to build DWH using SAP BW, you may not worry. SAP BW has proven comprehensive support for time-dependent master data out-of-the-box. But what if you need to build SAP HANA SQL Datawarehouse? Does SAP HANA support time-dependent scenarios like in SAP BW?
The answer is Yes. And SAP HANA has much more than SAP BW in the same area.
All samples from the blog were created in SAP HANA 2.0 SP04 Express Edition. I use SAP Web IDE for SAP HANA to create all table definitions (entities), to fill them with sample data and to execute queries. In the blog I consider 2 scenarios:
1. System versioning (SV): HANA registers timestamp (defined by HANA server) of changing data in a HANA table. User has the option to see what data was in the HANA table at any defined timestamp in the past.
2. Application time travelling (ATT): HANA registers moment of time (date or timestamp, defined by application) of changing data in HANA table. User has the option to see what data was changed by application for any moment of time (date or timestamp) in the past.
Let’s start from System Versioning (SV) scenario.
I created hdbcds-file with 2 entities: info [this is primary entity] and info_hist [this is linked entity to store system versions of data from primary entity]. Please, pay attention, that the primary entity
◉ must have primary key;
◉ must have 2 columns of UTCTimestamp data type (not null) to support SV.
Below is manager.hdbcds file
namespace cvodata.db.tt;
context manager {
entity info {
key Id : Integer not null;
surname : String(100);
city : String(51);
revenue : Integer;
TidFrom : LocalDate not null;
TidTo : LocalDate not null;
SysValidFrom : UTCTimestamp not null;
SysValidTo : UTCTimestamp not null;
};
entity info_hist {
Id : Integer not null;
surname : String(100);
city : String(51);
revenue : Integer;
TidFrom : LocalDate not null;
TidTo : LocalDate not null;
SysValidFrom : UTCTimestamp not null;
SysValidTo : UTCTimestamp not null;
};
};
We need to define connection between info and info_hist entities. We use manager.hdbsystemversioning file for it.
SYSTEM VERSIONING "cvodata.db.tt::manager.info"("SysValidFrom", "SysValidTo") HISTORY TABLE "cvodata.db.tt::manager.info_hist" NOT VALIDATED
Code below is for deleting (if any) and inserting sample records
DELETE FROM "CVODATA_HDI_DB"."cvodata.db.tt::manager.info";
DELETE FROM "CVODATA_HDI_DB"."cvodata.db.tt::manager.info_hist";
INSERT INTO "CVODATA_HDI_DB"."cvodata.db.tt::manager.info" VALUES(
1/*Id <INTEGER>*/,
'Smith'/*surname <NVARCHAR(100)>*/,
'Moscow'/*city <NVARCHAR(60)>*/,
100/*revenue <INTEGER>*/,
TO_DATE('2000-01-01','YYYY-MM-DD') /*TidFrom <DATE>*/,
TO_DATE('9999-12-31','YYYY-MM-DD') /*TidTo <DATE>*/
);
INSERT INTO "CVODATA_HDI_DB"."cvodata.db.tt::manager.info" VALUES(
2/*Id <INTEGER>*/,
'Ivanov'/*surname <NVARCHAR(100)>*/,
'Pskov'/*city <NVARCHAR(60)>*/,
10/*revenue <INTEGER>*/,
TO_DATE('2000-01-01','YYYY-MM-DD') /*TidFrom <DATE>*/,
TO_DATE('9999-12-31','YYYY-MM-DD') /*TidTo <DATE>*/
);
SELECT * FROM "CVODATA_HDI_DB"."cvodata.db.tt::manager.info" ORDER BY "Id" ASC
Let’s update city for «Id» = 2
UPDATE "CVODATA_HDI_DB"."cvodata.db.tt::manager.info"
SET "city" = 'Sankt Petersburg'
WHERE "Id" = 2;
And check current data in the table. We see updated 2nd row. Fields «city» and «SysValidFrom» changed their values.
To get data for the moment of time before executing UPDATE city to ‘Sankt Petersburg’, I execute SELECT with FOR SYSTEM_TIME AS OF.
SELECT * FROM "CVODATA_HDI_DB"."cvodata.db.tt::manager.info"
FOR SYSTEM_TIME AS OF '2020-02-24 11:39:27.077257300'
ORDER BY "Id" ASC, "SysValidFrom" ASC;
It returns data at the defined point of time in the past. The associated history entity is internally used for it. We do not query the history entity directly.
If we execute the query for the moment of time in the distant past
SELECT * FROM "CVODATA_HDI_DB"."cvodata.db.tt::manager.info"
FOR SYSTEM_TIME AS OF '2019-02-24 11:39:27.077257300'
ORDER BY "Id" ASC, "SysValidFrom" ASC;
We get an empty result set. Because of the primary entity was empty at 2019-02-24.
Similar SELECT-query with FOR SYSTEM_TIME FROM … TO returns all data relevant for the selected period of time
SELECT * FROM "CVODATA_HDI_DB"."cvodata.db.tt::manager.info"
FOR SYSTEM_TIME FROM '2020-02-24 11:39:27.077257300' TO '2020-02-24 13:10:12.739263800'
ORDER BY "Id" ASC, "SysValidFrom" ASC;
So, this is how the SV works. Below are some notes regarding primary and system versioning SV (history) entities.
◉ if we execute the same update (for “Id” = 2) twice, HANA adds 2 records in SV-entity. You can execute one-by-one insert, update, delete and insert record again and SV-entity will contain records for all respective changes (except for insert);
◉ you can delete any data from SV-entity anytime;
◉ when you delete record from the primary entity, a new record in SV-entity is created;
◉ no information in SV-entity about what was particularly changed since the previous state. SV-entity contains only values just before changes. So, insert operation do not create a record in SV-entity;
◉ It is possible to add columns to both entities in a single build action, even when both entities have rows;
◉ It is possible to change datatype columns to both entities in a single build action: primary and SV (if entities have rows, not every datatype change is allowed);
◉ It is possible to remove (non-PK and non-system-date) columns from both entities, even if entities have rows
Not supported features are below:
◉ cannot insert or update into SV-entity: (but until the entity is not considered as SV, it could be inserted/updated)
◉ SV-entity must be empty if option VALIDATED is defined in hdbsystemversioning.
If option NOT VALIDATED is defined, SV-entity may contain records during building of .hdbsystemversioning)
◉ SV-entity and primary entity must not have a different number of columns;
◉ SV-entity and primary entity must not have a column with the same name but different type or length;
Application time-traveling (ATT) scenario
I will use primary entity from SV scenario and the same initial data except
◉ Info-hist entity is not required for ATT. Only primary entity is required.
◉ Primary entity must have two fields (type of date/datetime) for validity period.
◉ Primary key must contain one of date fields which defines validity period
◉ configuration file .hdbapplicationtime is mandatory.
◉ configuration file .hdbsystemversioning is not required for ATT
namespace cvodata.db.tt;
context manager {
entity info {
key Id : Integer not null;
surname : String(100);
city : String(51);
revenue : Integer;
key TidFrom : LocalDate not null;
TidTo : LocalDate not null;
};
}
Below is manager.hdbapplicationtime file. Fields TidFrom, TidTo define the period of validity (from application perspective) for particular record
APPLICATION TIME "cvodata.db.tt::manager.info"("TidFrom", "TidTo")
Loading initial data
DELETE FROM "CVODATA_HDI_DB"."cvodata.db.tt::manager.info";
INSERT INTO "CVODATA_HDI_DB"."cvodata.db.tt::manager.info" VALUES(
1/*Id <INTEGER>*/,
'Smith'/*surname <NVARCHAR(100)>*/,
'Moscow'/*city <NVARCHAR(60)>*/,
100/*revenue <INTEGER>*/,
TO_DATE('2000-01-01','YYYY-MM-DD') /*TidFrom <DATE>*/,
TO_DATE('9999-12-31','YYYY-MM-DD') /*TidTo <DATE>*/
);
INSERT INTO "CVODATA_HDI_DB"."cvodata.db.tt::manager.info" VALUES(
2/*Id <INTEGER>*/,
'Ivanov'/*surname <NVARCHAR(100)>*/,
'Pskov'/*city <NVARCHAR(60)>*/,
10/*revenue <INTEGER>*/,
TO_DATE('2000-01-01','YYYY-MM-DD') /*TidFrom <DATE>*/,
TO_DATE('9999-12-31','YYYY-MM-DD') /*TidTo <DATE>*/
);
Usual SELECT query
SELECT * FROM "CVODATA_HDI_DB"."cvodata.db.tt::manager.info"
ORDER BY "Id" ASC, "TidFrom" ASC;
returns the following:
Now let’s update “city” attribute for “Id” = 2 by the following query. We use FOR PORTION ON APPLICATION_TIME here.
UPDATE "CVODATA_HDI_DB"."cvodata.db.tt::manager.info"
FOR PORTION OF APPLICATION_TIME
FROM '2015-08-20' TO '2015-12-31' SET "city" = 'Hamburg' WHERE "Id" = 2;
and we get
You can see that HANA has split (automatically) row “Id” = 2 and “city” = Pskov and has added row with the same “Id” and “city” = Hamburg. The validity period is adjusted accordingly.
Below is the example of splitting “Id”=2 one more time
UPDATE "CVODATA_HDI_DB"."cvodata.db.tt::manager.info"
FOR PORTION OF APPLICATION_TIME
FROM '2015-10-20' TO '2015-12-31' SET "city" = 'Amsterdam' WHERE "Id" = 2;
and we get
Now let’s check the data, which are valid as of 2016-11-20. We use FOR APPLICATION_TIME AS OF here.
SELECT * FROM "CVODATA_HDI_DB"."cvodata.db.tt::manager.info"
FOR APPLICATION_TIME AS OF '2016-11-20';
and we get
.. which are valid as of 2015-11-20
SELECT * FROM "CVODATA_HDI_DB"."cvodata.db.tt::manager.info"
FOR APPLICATION_TIME AS OF '2015-11-20';
and we get
.. which are valid as of 1999-11-20
we get an empty resultset.
Some points regarding ATT
◉ If you use UPDATE without FOR PORTION OF APPLICATION_TIME, it merely updates all rows limited by WHERE-clause [if defined]
◉ You could combine in one entity both system versioning (SV) and application time travel (ATT) scenarios.
◉ In my example, I used LocalDate type for validity period fields. But ATT supports DATETIME fields as well. This is one of the key differentiators from SAP BW, which supports only DATE datatype for the same scenario.
◉ Adding/removing columns are possible for ATT-entities both with data and empty
Restrictions for both scenarios in SAP HANA SQL DWH:
◉ extended SQL DML syntax is not supported (yet) in flowgraph. You have to create stored procedure with appropriate SQL DML to execute UPDATE … FOR PORTION OF APPLICATION_TIME FROM… .
◉ extended SQL-SELECT syntax for SV and ATT are not supported (yet) in HANA Calculation Views in graph mode. You have to create table function with FOR SYSTEM_TIME AS OF and/or FOR APPLICATION_TIME AS OF in SELECT – queries as data foundation for CV.
No comments:
Post a Comment