SAP HANA History Table - Time traveling using SAP HANA

SAP HANA History table is a way to support time travel queries. Time travel queries are queries against historical states of the database. A history table is a special type of table in HANA which records the period of time when a row is valid. With history tables, you can easily track and analyze changes in your business and accurately compare data from two points in time.

In this article we will see what history table is, how history table is important for business analysis and how we can implement and use history table in SAP HANA.

What is History Table in SAP HANA?

SAP HANA history table is also known as 'HISTORY COLUMN' tables.

SAP HANA History table is a way to support time travel queries. Write operations on history tables do not physically overwrite existing records. Instead, write operations always insert new versions of the data record into the database. The different versions of a data record have timestamp-like attributes that indicate their validity. Update and deletion of a record in history table result in a new version of the existing record, with the original record being invalidated.

History tables can be used just like other tables but allow for so called time-travel queries (AS OF UTCTIMESTAMP extension). With these you can access the data in a table as it has been at the timestamp provided.


"$validfrom$" and "$validto$" are hidden column of history tables. These 2 columns are updated whenever there is an insert/update/delete on the history table. 

A question mark (?) under the $validto$ column represents that the data in these rows is still valid (they don’t have any end date yet). 

Note: In SAP HANA "$validfrom$" and "$validto$" contains Commit ID instead of timestamp. This example is showing timestamp just to make the concept easy to understand. 
  1. Create History Tables
  2. Turn Auto Commit Off
  3. Keep track of commit IDs (or connection IDs) dealing with the History tables
  4. Execute Time Travel Queries on the History Tables

Create SAP HANA History Tables:

Syntax for creating a History Table:
CREATE HISTORY COLUMN TABLE "<SCHEMA_NAME>"."TEST_HISTORY_TABLE"
              ("ID" INTEGER NOT NULL,
              "NAME" VARCHAR(20),
              CITY VARCHAR(20),
              PRIMARY KEY ("ID")
);


Note:
• A history table must have a primary key.
• History table have session type as ‘HISTORY’
• The session type can be checked from the column, SESSION_TYPE of the system table SYS.TABLES. Use the following code to check the session type:
SELECT SCHEMA_NAME, TABLE_NAME, SESSION_TYPE, TABLE_TYPE 
FROM SYS.TABLES 
WHERE TABLE_NAME = 'TEST_HISTORY_TABLE'


Turn Auto Commit Off in SAP HANA:

In order for time travel to work, the Auto commit option of SAP HANA DB needs to be turned off. This limits the number of Commit IDs that needs to be worked with. Besides, if not done, an exception will be thrown with an appropriate error message. 

Steps for turning off Auto Commit Option:
1. Open the Properties tab of the SQL Editor: In SAP HANA editor, when the SQL Editor is open, click: Windows -> Show View -> Properties. 


2. In the properties window at the bottom of screen, select ‘Off’ against “Auto Commit” option. 


Keeping track of commit IDs / connection IDs:

lets us first add some data into history table.
INSERT INTO "<SCHEMA_NAME>"."TEST_HISTORY_TABLE" VALUES (1001,'Christina','Berlin'); 
INSERT INTO "<SCHEMA_NAME>"."TEST_HISTORY_TABLE" VALUES (1002,'Philip', 'London' );
INSERT INTO "<SCHEMA_NAME>"."TEST_HISTORY_TABLE" VALUES (1003,'John', 'New York' );
COMMIT;
UPDATE "<SCHEMA_NAME>"."TEST_HISTORY_TABLE" MERGE DELTA INDEX;

UPDATE "<SCHEMA_NAME>"."TEST_HISTORY_TABLE" SET CITY = 'Miami' WHE
RE NAME = 'John';
COMMIT;
UPDATE "<SCHEMA_NAME>"."TEST_HISTORY_TABLE" MERGE DELTA INDEX;

The historical state of the History table can be referenced in two ways, either by using the Commit ID, or via UTC time stamping. Details on actual usage are described in the next section (Executing Time Travel Queries). 

Since Auto Commit has been turned off, the COMMIT command is necessary to ensure that the data is inserted into the table. So after every transaction is complete, commit statement should be used to commit the values into the database. Each commit statement in SAP HANA is marked internally with a commit ID, which can be viewed using following syntax: 
SELECT LAST_COMMIT_ID FROM M_TRANSACTIONS 
WHERE CONNECTION_ID = CURRENT_CONNECTION;

Commit ID can also be obtained from M_HISTORY_INDEX_LAST_COMMIT_ID after each commit. 
SELECT LAST_COMMIT_ID FROM M_HISTORY_INDEX_LAST_COMMIT_ID WHERE SESSION_ID = CURRENT_CONNECTION;

These values are used later on to reference back to any historical state of the database, when the transactions were performed on the database. 

The history of transactions can be validated by viewing the default hidden columns ‘$validfrom$’ and ‘$validto$’ of the history tables: 
SELECT *, "$validfrom$", "$validto$" FROM "<SCHEMA_NAME>"."TEST_HISTORY_TABLE" ('REQUEST_FLAGS'='ALLROWS');



"$validfrom$" and "$validto$" are hidden column of history tables. These 2 columns are updated whenever there is an insert/update/delete on the history table. 

A question mark (?) under the $validto$ column represents that the data in these rows is still valid (they don’t have any end date yet). 

Executing Time Travel Queries:

There are a number of ways of using “Time Travel” queries. We can either track the historic values using the commit ID for the transactions, or by using the UTC time-stamping for the transactions, or by setting the session back to some earlier historic session. 

Getting all the Commit IDs and UTC Timestamps:
The entire list of all commit IDs/Commit-time can be obtained using the following syntax:
SELECT * FROM TRANSACTION_HISTORY ORDER BY COMMIT_ID;

This result into the following list of Commit IDs: 


Note: In our case, the number of transaction was limited, so we have listed the Commit IDs for all the transactions made. In case this list is huge, the relevant commit IDs can be tracked by adding where clause in the above statement, as mentioned in following example:
SELECT * FROM TRANSACTION_HISTORY 
WHERE COMMIT_ID >= 10 ORDER BY COMMIT_ID;

Now the TIME TRAVEL queries can be executed, i.e. historical values can be retrieved in following possible ways:
  1. SELECT VALUES AS OF COMMIT_ID
  2. SELECT VALUES AS OF UTCTIMESTAMP

SELECT VALUES AS OF COMMIT_ID:

Use the select statement with phrase ‘AS OF COMMIT ID’ to refer to some historical values of the history table. 

The historical values of the table, when the tables were first committed (i.e. Commit ID= 10) can be derived using the following syntax: 

SELECT * FROM "<SCHEMA_NAME>"."TEST_HISTORY_TABLE" AS OF COMMIT ID 10;

Example:
In this example we will show how exactly we can use the Commit ID to traverse thru history table. 
Let us start from the creating the table. If you have already created the above mentioned history table TEST_HISTORY_TABLE then drop it using the query. 
DROP TABLE "<SCHEMA_NAME>"."TEST_HISTORY_TABLE"; 
COMMIT;

Step 1: Create History table.
CREATE HISTORY COLUMN TABLE "<SCHEMA_NAME>"."TEST_HISTORY_TABLE" 
              ("ID" INTEGER NOT NULL,
              "NAME" VARCHAR(20),
              CITY VARCHAR(20),
              PRIMARY KEY ("ID"));
COMMIT;

Step 2: Insert some records in table.
INSERT INTO "<SCHEMA_NAME>"."TEST_HISTORY_TABLE" VALUES (1001,'Christina','Berlin'); 
INSERT INTO "<SCHEMA_NAME>"."TEST_HISTORY_TABLE" VALUES (1002,'Philip', 'London' );
INSERT INTO "<SCHEMA_NAME>"."TEST_HISTORY_TABLE" VALUES (1003,'John', 'New York' );
COMMIT;
UPDATE "<SCHEMA_NAME>"."TEST_HISTORY_TABLE" MERGE DELTA INDEX;

Step 3: Fetch the Commit ID related with INSERT and save it somewhere.
SELECT LAST_COMMIT_ID FROM M_TRANSACTIONS 
WHERE CONNECTION_ID = CURRENT_CONNECTION;
Output: 100. 

Step 4: Update a records.
UPDATE "<SCHEMA_NAME>"."TEST_HISTORY_TABLE" SET CITY = 'Miami' WHERE NAME = 'John';
COMMIT;
UPDATE "<SCHEMA_NAME>"."TEST_HISTORY_TABLE" MERGE DELTA INDEX;

Step 5: Fetch the Commit ID related with UPDATE and save it somewhere.
SELECT LAST_COMMIT_ID FROM M_TRANSACTIONS 
WHERE CONNECTION_ID = CURRENT_CONNECTION;

Output: 200. 
Step 6: Delete a record.
DELETE FROM "<SCHEMA_NAME>"."TEST_HISTORY_TABLE" WHERE NAME = ‘Christina’; 
COMMIT;
UPDATE "<SCHEMA_NAME>"."TEST_HISTORY_TABLE" MERGE DELTA INDEX;

Step 7: Fetch the Commit ID related with UPDATE and save it somewhere.
SELECT LAST_COMMIT_ID FROM M_TRANSACTIONS 
WHERE CONNECTION_ID = CURRENT_CONNECTION;
Output: 300. 

Fetch data from history table based on different Commit IDs:
Fetch the data for Commit ID 100.
SELECT * FROM "<SCHEMA_NAME>"."TEST_HISTORY_TABLE" AS OF COMMIT ID 100;


After the Commit ID 200, we updated one record. Fetch the data for Commit ID 200. 
SELECT * FROM "<SCHEMA_NAME>"."TEST_HISTORY_TABLE" AS OF COMMIT ID 200;


After Commit ID 300 we deleted one record. Fetch the data for Commit ID 300.
SELECT * FROM "<SCHEMA_NAME>"."TEST_HISTORY_TABLE" AS OF COMMIT ID 300;


Note: Using 'REQUEST_FLAGS'='ALLROWS' we can fetch all the records of history table. The hidden columns "$validfrom$" and "$validto$" are also very handy if you want to check the Commit IDs.
SELECT *, "$validfrom$", "$validto$" FROM "<SCHEMA_NAME>"."TEST_HISTORY_TABLE" ('REQUEST_FLAGS'='ALLROWS');


This way, simple (or bit complex) queries can be used to browse through historic data without having to actually load the old data, or keeping multiple copies of data in backup, saving enormous amount of storage space and efforts. 

SELECT VALUES AS OF UTCTIMESTAMP:

Using the select statement with phrase ‘AS OF UTCTIMESTAMP’ also refers to historical values of the history table based on the UTC time stamps made in the history table. 

As already discussed, the UTC Timestamp can be identified from the SYS.TRANSACTION_HISTORY table. 

If specific commit ID is known, corresponding commit time can be identified using following syntax: 
SELECT COMMIT_TIME FROM SYS.TRANSACTION_HISTORY WHERE COMMIT_ID = 100;
Output: '2014-02-04 12:10:42.300

This timestamp can be used to refer to historical values of the tables using the following syntax:
SELECT * FROM "<SCHEMA_NAME> "."TEST_HISTORY_TABLE" AS OF UTCTIMESTAMP '2014-02-02 04:11:27.73' ;


SET HISTORY SESSION TO COMMIT ID:

A database session can be set back to a certain point-in-time, based on the provided value for Commit ID. For this SET HISTORY SESSION can be used.
SET HISTORY SESSIONTO COMMIT ID 100;
SELECT * FROM "<SCHEMA_NAME> "."TEST_HISTORY_TABLE";
This would result in same output as above, since we are referring to the same instance of the database at exactly the same historic time as the above query. 

By setting the HISTORY SESSION to a particular Commit ID, all subsequent queries will return data as of that Commit ID, until the command is cancelled by setting the history session to another value, or cancelling it by setting it to NOW. 
SET HISTORY SESSION TO NOW;


SET HISTORY SESSION TO UTCTIMESTAMP:

Similar to above method, the database session can be set back to a certain point-in-time, based on the provided value for UTC time stamp.
SET HISTORY SESSIONTO UTCTIMESTAMP ‘2014-02-02 04:11:27.73’; 
SELECT * FROM "<SCHEMA_NAME> "."TEST_HISTORY_TABLE";

Again, we would get the same output. In a way, these (time travelling via using SELECT statement with commit ID, UTC Timestamps or by using SET HISTORY SESSION) are all different ways of reaching to the same results. 

Delete Data From History Table:

We can delete the data from history table by using the commands. 
DELETE HISTORY FROM "<SCHEMA_NAME>"."TEST_HISTORY_TABLE";

When using the DELETE HISTORY command time travel queries referencing the deleted rows may still access these rows. In order to physically delete these rows the following statements have to be executed:
ALTER SYSTEM RECLAIM VERSION SPACE; 
MERGE HISTORY DELTA of "<SCHEMA_NAME>"."TEST_HISTORY_TABLE";
In some cases even the execution of the two statements above may not lead to physical deletion. 

To check whether the rows are physically deleted, the following statement can be helpful:
SELECT *, "$validfrom$", "$validto$" FROM "<SCHEMA_NAME>"."TEST_HISTORY_TABLE" ('REQUEST_FLAGS'= ('ALLCOMMITTED','HISTORYONLY'));

Note: The "WITH PARAMETERS ('REQUEST_FLAGS'= ('ALLCOMMITTED','HISTORYONLY'))" clause may only be used for the purpose of validating the result of using the DELETE HISTORY statement. 

Example: Suppose we want to delete only history data which is older than 6 month. We can do it by first setting the SESSION to 6 month back’s timestamp and deleting the history data.
SET HISTORY SESSION TO UTCTIMESTAMP '2013-07-00 00:00:00.000'; 
DELETE HISTORY FROM "<SCHEMA_NAME>"."TEST_HISTORY_TABLE" ;

SAP HANA History Table SQL Reference:

No comments:

Post a Comment