Wednesday, 8 September 2021

NSE: Systematically Measure Memory Savings

This guide will show a methodical approach to finding data with a large memory footprint and calculating the memory storage savings after moving them to disk via Native Storage Extension. The TPC-H dataset is used in this guide.

Assumptions:

◉ Reader has access to a provisioned SAP HANA instance SPS04+

◉ Reader has familiarity with Database Explorer, SAP HANA Cockpit, and HANA System Views

Observing the Memory Footprint

There are a few useful views for viewing the system’s memory footprint: M_CS_TABLES, M_CS_PARTITIONS, M_CS_ALL_COLUMNS.

First, we begin with observing which tables have the largest memory footprint. The following SQL will query the M_CS_TABLES view and converts the values in the size columns from bytes to megabytes.

SELECT

    TABLE_NAME,

    SCHEMA_NAME,

    MEMORY_SIZE_IN_TOTAL / 1024 / 1024 AS MEMORY_TOTAL_IN_MB, 

    MEMORY_SIZE_IN_PAGE_LOADABLE_MAIN / 1024 / 1024 AS MEMORY_SIZE_IN_PAGE_LOADABLE_IN_MB

FROM M_CS_TABLES WHERE SCHEMA_NAME='TPCH' ORDER BY MEMORY_SIZE_IN_TOTAL DESC;

SAP HANA Exam Prep, SAP HANA Tutorial and Material, SAP HANA Certification, SAP HANA Career, SAP HANA Study Material

Notice: There is quite a large table, LINEITEM. Let’s observe this tables footprint at the column level.

SELECT
    TABLE_NAME,
    COLUMN_NAME,
    MEMORY_SIZE_IN_TOTAL / 1024 / 1024 AS MEMORY_SIZE_IN_TOTAL_IN_MB,
    MEMORY_SIZE_IN_PAGE_LOADABLE_MAIN / 1024 / 1024 AS MEMORY_SIZE_IN_PAGE_LOADBALE_IN_MB
FROM M_CS_ALL_COLUMNS WHERE TABLE_NAME='LINEITEM' ORDER BY MEMORY_SIZE_IN_TOTAL DESC;

SAP HANA Exam Prep, SAP HANA Tutorial and Material, SAP HANA Certification, SAP HANA Career, SAP HANA Study Material

Notice: Columns L_ORDERKEY, L_COMMENT, L_EXTENDEDPRICE, and L_PARTKEY are quite large. Also, there seems to be multiple date columns that could be good candidates for partitions. Let’s observe these columns to get a sense of the data

SELECT
    L_ORDERKEY,
    L_COMMENT,
    L_EXTENDEDPRICE,
    L_RECEIPTDATE,
    L_SHIPDATE,
    L_COMMITDATE
FROM LINEITEM ORDER BY L_SHIPDATE DESC;

Notice: L_COMMENT is long text data that doesn’t immediately appear useful for day-to-day analytics, it makes sense to offload this column to disk. Also, notice that the L_SHIPDATE range is from 2012-01-02 to 2018-12-01. For typical, year-over-year analysis we only need the latest two years of data. We can make a partition and place all data with a SHIPDATE before 2017 on disk as well.

Before beginning to move data it is useful to use the following views to observe the current Buffer Cache configuration. To view results in these views the user running the query requires adequate system permissions.

SELECT * FROM M_BUFFER_CACHE_STATISTICS;
SELECT * FROM M_BUFFER_CACHE_POOL_STATISTICS;

Right now, the Buffer Cache is configured for 1GB. Altering the Buffer Cache size can be done with the following SQL. This requires a user with INIFILE privilege.

ALTER SYSTEM ALTER CONFIGURATION ('indexserver.ini', 'system') SET ('buffer_cache_cs', 'max_size') = '<SIZE IN MB>' WITH RECONFIGURE;

Decide Which Data to Unload

Native Storage Extension supports the ability to move portions of a table to disk. This allows for more flexibility in the performance-cost trade off from storing data on disk.

From the simple analysis of the LINEITEM table, the L_COMMENT column can be offloaded and the L_SHIPDATE column can be used to partition the table.

To partition the table on a date column while a table has a primary key, a multilevel partition is required.

ALTER TABLE LINEITEM PARTITION BY HASH (L_ORDERKEY) PARTITIONS 1, RANGE(L_SHIPDATE) (
    PARTITION '1990-01-01' <= VALUES < '2017-12-02', 
    PARTITION OTHERS
 );

Now, the newly made partition and the L_COMMENT column can be offloaded from memory and the memory savings can be calculated.

First, observe the M_CS_TABLES view again to see the partition sizes that were made.

SELECT
    TABLE_NAME,
    PART_ID,
    MEMORY_SIZE_IN_TOTAL / 1024 / 1024 AS MEMORY_SIZE_IN_TOTAL_IN_MB,
    MEMORY_SIZE_IN_PAGE_LOADABLE_MAIN / 1024 / 1024 AS MEMORY_SIZE_IN_PAGE_LOADABLE_IN_MB
FROM M_CS_TABLES WHERE TABLE_NAME='LINEITEM' ORDER BY MEMORY_SIZE_IN_TOTAL DESC;

SAP HANA Exam Prep, SAP HANA Tutorial and Material, SAP HANA Certification, SAP HANA Career, SAP HANA Study Material

Notice: There is a larger and a much smaller partition, partitions 2 and 1 respectively. Now, set the partition with ID 2 and column L_COMMENT to have the PAGE LOADABLE load unit.

Setting the Load Unit

The SQL syntax for setting load units can be found here. In this case, the SQL was the following.

ALTER TABLE LINEITEM ALTER ("L_COMMENT" NVARCHAR(44) PAGE LOADABLE);
ALTER TABLE LINEITEM ALTER PARTITION 2 PAGE LOADABLE;

Verify the changes in the M_TABLE_PARTITIONS and M_CS_ALL_COLUMNS view where the LOAD_UNIT column can be observed. A value of PAGE indicates data stored on disk and COLUMN indicates data stored in memory.

SELECT 
    TABLE_NAME,
    PART_ID,
    MEMORY_SIZE_IN_TOTAL / 1024 / 1024 AS MEMORY_SIZE_IN_TOTAL_IN_MB,
    MEMORY_SIZE_IN_MAIN / 1024 / 1024 AS MEMORY_SIZE_IN_MAIN_IN_MB,
    LOAD_UNIT
FROM M_CS_TABLES 
WHERE TABLE_NAME='LINEITEM' ORDER BY MEMORY_SIZE_IN_TOTAL DESC; 

SELECT
    TABLE_NAME,
    COLUMN_NAME,
    MEMORY_SIZE_IN_TOTAL / 1024 / 1024 AS MEMORY_SIZE_IN_TOTAL_IN_MB,
    MEMORY_SIZE_IN_MAIN / 1024 / 1024 AS MEMORY_SIZE_IN_MAIN_IN_MB,
   LOAD_UNIT
FROM M_CS_ALL_COLUMNS
WHERE TABLE_NAME='LINEITEM' ORDER BY MEMORY_SIZE_IN_TOTAL DESC;

SAP HANA Exam Prep, SAP HANA Tutorial and Material, SAP HANA Certification, SAP HANA Career, SAP HANA Study Material

Observe already, that the partition with PART_ID = 2 has gone from using 33.22 MB of main memory to only 0.8194 MB. (33.22-0.8194)/33.22 = 0.975, that’s a 97.5% reduction in main memory usage for this partition. Unload the table completely from memory with the following SQL.

UNLOAD LINEITEM;

Load the data back into memory using the following SQL.

LOAD LINEITEM ALL;

This unloads the LINEITEM table completely from memory and loads it back into memory for column loadable data and the buffer cache for page loadable.

Calculate the Total Memory Savings

To get a picture of what the table size was in memory, disk, and disk in page loadable format we can run this query.

ALTER TABLE TPCH.LINEITEM ALTER ("L_COMMENT" NVARCHAR(44) COLUMN LOADABLE);
ALTER TABLE TPCH.LINEITEM ALTER PARTITION 2 COLUMN LOADABLE;
UNLOAD TPCH.LINEITEM;
LOAD TPCH.LINEITEM ALL;
SELECT SUM(MEMORY_SIZE_IN_MAIN / 1024 / 1024), SUM(DISK_SIZE / 1024 / 1024 ), SUM(DISK_SIZE_IN_PAGE_LOADABLE / 1024 / 1024 ) FROM M_TABLE_PARTITIONS WHERE TABLE_NAME='LINEITEM';

SAP HANA Exam Prep, SAP HANA Tutorial and Material, SAP HANA Certification, SAP HANA Career, SAP HANA Study Material

We can observe that currently in main memory the table takes up 248.67 MB and like wise in DISK_SIZE if we were to unload it. Also note, since I’ve made the table completly COLUMN LOADABLE by reversing the load unit change I’ve made earlier, that currently there is 0 space taken up in the PAGE_LOADBALE disk storage. Now, I can compare this to the results after I’ve made my partition and L_COMMENT column PAGE LOADABLE.

ALTER TABLE TPCH.LINEITEM ALTER ("L_COMMENT" NVARCHAR(44) PAGE LOADABLE);
ALTER TABLE TPCH.LINEITEM ALTER PARTITION 2 PAGE LOADABLE;
UNLOAD TPCH.LINEITEM;
LOAD TPCH.LINEITEM ALL;
SELECT SUM(MEMORY_SIZE_IN_MAIN / 1024 / 1024), SUM(DISK_SIZE / 1024 / 1024 ), SUM(DISK_SIZE_IN_PAGE_LOADABLE / 1024 / 1024 ) FROM M_TABLE_PARTITIONS WHERE TABLE_NAME='LINEITEM';

SAP HANA Exam Prep, SAP HANA Tutorial and Material, SAP HANA Certification, SAP HANA Career, SAP HANA Study Material

The result shows that the table now only takes 83.54 MB of main memory and 170.35 MB in the PAGE LOADABLE disk storage. To compare main memory usage, (248.67 – 83.54) / 248.67 = 0.664. A 66.4% reduction in the main memory footprint for this table.

The same steps can be done on the column level as well. I will place it in a single query below.

ALTER TABLE TPCH.LINEITEM ALTER ("L_COMMENT" NVARCHAR(44) COLUMN LOADABLE);
UNLOAD TPCH.LINEITEM;
LOAD TPCH.LINEITEM ALL;
SELECT SUM(MEMORY_SIZE_IN_MAIN / 1024 / 1024) FROM M_CS_ALL_COLUMNS WHERE TABLE_NAME='LINEITEM' AND COLUMN_NAME='L_COMMENT';

ALTER TABLE TPCH.LINEITEM ALTER ("L_COMMENT" NVARCHAR(44) PAGE LOADABLE);
UNLOAD TPCH.LINEITEM;
LOAD TPCH.LINEITEM ALL;
SELECT SUM(MEMORY_SIZE_IN_MAIN / 1024 / 1024) FROM M_CS_ALL_COLUMNS WHERE TABLE_NAME='LINEITEM' AND COLUMN_NAME='L_COMMENT';

Below are the results for the main memory footprint of the L_COMMENT column as column loadbale and page loadable respectively.

SAP HANA Exam Prep, SAP HANA Tutorial and Material, SAP HANA Certification, SAP HANA Career, SAP HANA Study Material

SAP HANA Exam Prep, SAP HANA Tutorial and Material, SAP HANA Certification, SAP HANA Career, SAP HANA Study Material

No comments:

Post a Comment