Introduction:
I am working for a customer and their business is mainly in the oil and gas industry. As you may be aware of SAP’s solution IS-Oil (Industry Specific – oil & gas) is tailor-made for the oil and gas industry. With the emerging of S/4HANA many-core module database tables have been re-designed or new tables are made available for simplicity, redundancy, and performance reasons. So, we are going to take an example based on two important tables which are MATDOCOIL and MATDOCOIL_INDEX.
MATDOCOIL: It is a standard SAP Transparent table in SAP IS-Oil application, which mainly contains additional converted quantities, unit of measures, and Quantity conversion parameters. Up to 60 additional quantities (ADQNTP*) and unit of measure (MSEHI*) can be stored into separate table columns.
MATDOCOIL_INDEX: It is a standard SAP Transparent table in SAP IS-Oil application, the main purpose of this table is to support SAP HANA-optimized aggregation for additional oil quantities in MATDOCOIL through new indexing format; this means that individual additional quantities are stored in separate columns within the table MATDOCOIL and are then referenced within table MATDOCOIL_INDEX. Example: entry in the table with key MSEHI = ‘KG’ and MSEHI_INDEX = 3 means additional converted quantity in KG unit of measure is stored in the columns ADQNTP3, MSEHI3 of the table MATDOCOIL
What is union data pruning?
The simple answer is the technique for the union view node that helps to optimize the query execution at the union node level.
Requirement (Scenario):
To build a material inventory report using SAP HANA calculation views with the user having an option to select a required unit of measure using HANA Input Parameter and Display the additional converted quantity along with a unit of measure. As mentioned in the introduction MATDOCOIL table contains additional converted quantities into different units of measures based on the configuration e.g. KG, BBL, LB, UG6, TON, etc.
Problem:
Additional converted quantity, units of measure are stored in different columns ADQNTP*, MSEHI* in MATDOCOIL table. Hence transpose of columns into rows is required. This is achieved using creating multiple projection nodes (Projection_1 for ADQNTP1, MSEHI1, Projection_2 for ADQNTP2, MSEHI2, and so forth) mapped DATA SOURCES into UNION node in HANA Calculation view with Input Parameters like for unit of measure, etc.
GOOD NEWS! Because we were able to meet the requirement for the report. Reporting is showing the expected numbers, BUT WHAT ABOUT PERFORMANCE? Are the business users happy? The big answer is NO. Often in the analytics projects, many a time no one calls out good(great) performance as a requirement it’s overlooked and later stage work is done as performance improvement. So why not do from Day1 and Be pro-active?
Now we have the requirement and the problem. Let’s JUMP into the main topic now i.e. UNION Pruning to get the best performance with small effort. Broadly there are two options: Option 1. Configuration Table Based Union Pruning. Option 2. Constant Mapping Based Union Pruning. I will try my best to discuss both options with pros, cons, and use cases related to our requirements.
Transpose of Columns into Rows Design.
Option 1. Configuration Table Based Union Pruning.
◉ As the name says it based on a configuration table so there is a need to create a table (.hdbtable) or database view with standard configuration columns. Entries in this table help the HANA engine in run-time to decide on union node which data sources (projection_1, projection_2, etc.) to consider for search rather NOT looking at all projections.
◉ WITHOUT union pruning HANA calculation view EXPLAIN PLAN and Execution time.
1. Performance Measure.
◉ Column search is performed on ALL data sources (projections).
◉ Execution time: ~556 ms for 1000 records.
◉ High memory consumption (Not shown here but was checked in Plan Viz)
◉ WITH union pruning HANA calculation view Steps, EXPLAIN PLAN, and Execution time.
1. Create a pruning configuration table.
3. Maintain the entries in the configuration table. This can be done using manual inserts, CSV upload, stored procedure, etc. If you notice column INPUT which is maintained for different union data sources (projection_1, projection2, projection_22, etc.) and LOW_VALUE column contains different units of measure. This configuration table helps the engine to perform a search only on a specific data source(projection) based on LOW_VALUE which is MSEHI1 and filtered using HANA Input Parameter.
5. Add Projection node with filter after Union node. On column MSEHI1. This is also maintained on the pruning configuration table.
7. To enable pruning on the HANA Calculation view add a pruning configuration table created in the Advanced section setting. Note: If the configuration table is empty then the HANA calculation view will execute as normal WITHOUT union pruning as discussed earlier.
9. Performance Measure.
◉ Column search is performed on ONLY data sources (projections) based on configuration table match found. In this case its on Projection_1 for UoM = ‘BB6’. You see in EXPLAIN PLAN with one search performed only.
◉ Execution time: ~343 ms for 1000 records. Around 38% improvement seen on 1000 records and it bumps up exponential with high data volume like ~50-60%
◉ Less memory consumption (Not shown here but was checked in Plan Viz)
Option 2. Constant Mapping Based Union Pruning.
◉ As the name says it is based on constant mapping. Yes, its union node constant mapping column and filtering it to avoid unwanted search on other data sources (projection nodes).
◉ WITHOUT union pruning HANA calculation view EXPLAIN PLAN and Execution time. Different from the above example because performed in a development environment.
1. Performance Measure.
◉ Column search is performed on ALL data sources (projections).
◉ Execution time: ~107 ms for 1000 records.
◉ High memory consumption (Not shown here but was checked in Plan Viz)
◉ WITH constant mapping union node pruning HANA calculation view Steps, EXPLAIN PLAN, and Execution time.
1. Create stored procedure with input for a unit of measure HANA input parameter and Output to get MSEHIINDEX value. MATDOCOIL_INDEX table used.
3. Create HANA Input Parameter of Type Derived from Procedure/Scalar UDF and use the above-stored procedure. Map the Input Parameters correctly.
5. Create a CONSTANT column in union node with values 1,2, 3..10..11..20.. etc. based on the ADQNTP*, MSEHI* mappings.
7. Add Projection node with filter after Union node. On constant column PRUNING_INDEX. This step enables pruning.
9. Performance Measure.
◉ Column search is performed for one data source (projection_1) without filter condition due to constant mapping in the union node.
◉ Execution time: ~52 ms for 1000 records. Around 51% improvement seen on 1000 records and it bumps up exponential with high data volume like ~60%
◉ Low memory consumption (Not shown here but was checked in Plan Viz)
No comments:
Post a Comment