Monday, 22 April 2019

How to pivot/unpivot in SAP HANA

Introduction


Currently there is no built-in pivot/unpivot function in HANA. In this blog you will find a workaround how to implement this in SQLScript.

Pivot/Unpivot


Pivoting is the transformation from the rows into the columns.
SAP HANA, SAP HANA Certifications, SAP HANA Tutorials and Materials, SAP HANA Guides

Unpivoting is the transformation from the columns into the rows.

Idea & Solution


Even if there does no built-in function in HANA exists, there are workarounds how to do this in SQL. For the pivot-function it is possible to create new columns with CASE/WHEN and filter for particular values.

Pivot


First of all, we will create a test table with three columns. Let’s call the table TEST_A and the columns PROD, DATE_YEAR and SALES.

CREATE TABLE TEST_A(
PROD VARCHAR(1),
DATE_YEAR INT,
SALES INT);

Insert some data into the table and display the values. 

INSERT INTO TEST_A VALUES ('A',2015,123456);
INSERT INTO TEST_A VALUES ('A',2016,234567);
INSERT INTO TEST_A VALUES ('A',2017,345678);
INSERT INTO TEST_A VALUES ('A',2018,456789);
INSERT INTO TEST_A VALUES ('A',2019,567890);

PROD DATE_YEAR SALES
2015  123456 
2016  234567 
2017  345678 
2018  456789 
2019  567890 

Now we can pivot the values with CASE/WHEN

SELECT
PROD,
SUM(CASE WHEN DATE_YEAR = 2015 THEN SALES END) AS YEAR_2015,
SUM(CASE WHEN DATE_YEAR = 2016 THEN SALES END) AS YEAR_2016,
SUM(CASE WHEN DATE_YEAR = 2017 THEN SALES END) AS YEAR_2017,
SUM(CASE WHEN DATE_YEAR = 2018 THEN SALES END) AS YEAR_2018,
SUM(CASE WHEN DATE_YEAR = 2019 THEN SALES END) AS YEAR_2019 
FROM TEST_A 
GROUP BY PROD;

The result of the query is:

PROD YEAR_2015 YEAR_2016 YEAR_2017  YEAR_2018  YEAR_2019
123456 234567 345678 456789  567890 

Well, you don’t want to write this code every time, if you need to pivot a table. Because of this, we can use SQLScript to generate this code and execute it directly. If you have trouble to execute this code, you can also use instead of CREATE OR REPLACE just CREATE.

CREATE OR REPLACE PROCEDURE P_PIVOT(
IN table_name VARCHAR(127),
IN schema_name VARCHAR(127) , 
IN select_columns VARCHAR(2147483647),
IN agg_typ VARCHAR(20),
IN agg_column VARCHAR(127),
IN pivot_column VARCHAR(2147483647),
IN pivot_values VARCHAR(2147483647))
LANGUAGE SQLSCRIPT
SQL SECURITY INVOKER
READS SQL DATA AS
BEGIN
USING SQLSCRIPT_STRING AS lib;
DECLARE v_table_name VARCHAR(127) = table_name;
DECLARE v_schema_name VARCHAR(127) = schema_name;
DECLARE v_select_columns VARCHAR(2147483647) = select_columns;
DECLARE v_agg_typ VARCHAR(20) = agg_typ;
DECLARE v_agg_column VARCHAR(127) = agg_column;
DECLARE v_pivot_column VARCHAR(2147483647) = pivot_column;
DECLARE v_pivot_values VARCHAR(2147483647) = pivot_values;

DECLARE v_count INT= 0;
DECLARE v_idx INT;
DECLARE v_statement VARCHAR(2147483647);
DECLARE a_pivot_values VARCHAR(127) ARRAY;

/*
if all columns needed, use * to get the column names except v_agg_column,v_pivot_column
*/
IF v_select_columns = '*'
THEN
SELECT string_agg(column_name,',' ORDER BY position) INTO v_select_columns FROM sys.columns 
WHERE table_name = v_table_name
AND schema_name = v_schema_name 
AND column_name NOT IN (v_agg_column,v_pivot_column);
END IF;
/*
if all values in the pivot column should use for pivoting
*/
IF v_pivot_values = '*'
THEN
EXECUTE IMMEDIATE ('select string_agg('||:v_pivot_column ||', '','' order by ' || :v_pivot_column || ') from (select distinct ' || :v_pivot_column || ' from '|| :v_table_name || ')') into v_pivot_values;
END IF;

a_pivot_values := LIB:split_to_array( :v_pivot_values, ',' );
v_count = cardinality(:a_pivot_values);
v_statement = 'select ' || v_select_columns;

/*
generate the statement
*/
FOR v_idx IN 1.. v_count DO
v_statement = v_statement || ', ' || v_agg_typ || '(case when ' || v_pivot_column || ' = ' || :a_pivot_values[:v_idx] || ' then ' || v_agg_column || ' end ) as ' || v_pivot_column || '_' || :a_pivot_values[:v_idx];
END FOR;
v_statement = v_statement || ' from ' || v_table_name || ' group by ' || v_select_columns;
/*
execute the statement
*/
EXECUTE IMMEDIATE v_statement;
END;

You can call this procedure with list of parameters as listed below:

Parameter name Description 
table_name Name of table, which should pivot
schema_name  Name of schema, where the table is created 
select_columns  List of columns, which should display or * 
agg_typ  Type of aggregation like sum, count, min, max etc. 
agg_column  Column, which should aggregate 
pivot_column   Column, which should pivot 
pivot_values  List of values in the pivot_column, which should generate as separate columns or *. Please consider the maximum number of columns. 

For example, the procedure can be called like this:

CALL P_PIVOT('TEST_A', '', 'PROD','SUM','SALES','DATE_YEAR','*');
CALL P_PIVOT('TEST_A', '', 'PROD','SUM','SALES','DATE_YEAR','2015,2016');
CALL P_PIVOT('TEST_A', 'SYSTEM', '*','SUM','SALES','DATE_YEAR','*');

Unpivot


The unpivot functionality can be realized by using the MAP function and SERIES_GENERATE_INTEGER. As the picture above describes, unpivot means generate rows out of columns. The function SERIES_GENERATE_INTEGER can be utilized to generate an integer table, which can be used for joining with the source table to generate multiple rows.

We create now a table with product and several sales years and call it TEST_B.

CREATE TABLE TEST_B(
PROD VARCHAR(1),
SALES_YEAR_2015 INT,
SALES_YEAR_2016 INT,
SALES_YEAR_2017 INT,
SALES_YEAR_2018 INT,
SALES_YEAR_2019 INT);

Insert data into the table and display.

INSERT INTO TEST_B VALUES('A', 123456, 234567, 345678, 456789, 567890);
INSERT INTO TEST_B VALUES('B', 123456, 234567, 345678, 456789, null);

PROD YEAR_2015 YEAR_2016 YEAR_2017  YEAR_2018  YEAR_2019
123456 234567 345678 456789  567890 
B 123456 234567 345678 456789 

Now we can use the functions mentioned above to generate the unpivot table:

SELECT
PROD,
MAP(element_number,
1, '2015',
2, '2016',
3, '2017',
4, '2018',
5, '2019') AS "DATE_YEAR",
MAP(element_number ,
1, SALES_YEAR_2015,
2, SALES_YEAR_2016,
3, SALES_YEAR_2017,
4, SALES_YEAR_2018,
5, SALES_YEAR_2019) AS "SALES" 
FROM TEST_B,
SERIES_GENERATE_INTEGER(1, 1, 6) 
ORDER BY element_number;

The result of the query is:

PROD DATE_YEAR   SALES 
A 2015 123.456
2015  123.456 
2016  234.567
2016  234.567 
2017  345.678 
2017  345.678 
2018  456.789 
2018  456.789 
2019  567.890 
2019 

To automatically generate this code, the following procedure can be used:

CREATE OR REPLACE PROCEDURE P_UNPIVOT(
IN table_name VARCHAR(127),
IN schema_name VARCHAR(127) , 
IN select_columns VARCHAR(2147483647),
IN unpivot_val_name VARCHAR(127),
IN unpivot_col_name VARCHAR(127),
IN unpivot_columns VARCHAR(2147483647),
IN unpivot_column_values VARCHAR(2147483647),
IN include_nulls BOOLEAN DEFAULT TRUE)
LANGUAGE SQLSCRIPT
SQL SECURITY INVOKER
READS SQL DATA AS
BEGIN
USING SQLSCRIPT_STRING AS lib;
DECLARE v_table_name VARCHAR(127) = table_name;
DECLARE v_schema_name VARCHAR(127) = schema_name;
DECLARE v_select_columns VARCHAR(2147483647) = select_columns;
DECLARE v_unpivot_val_name VARCHAR(127) = unpivot_val_name;
DECLARE v_unpivot_col_name VARCHAR(127) = unpivot_col_name;
DECLARE v_unpivot_columns VARCHAR(2147483647) = unpivot_columns;
DECLARE v_unpivot_column_values VARCHAR(2147483647) = unpivot_column_values;
DECLARE v_count INT= 0;
DECLARE v_idx INT;
DECLARE v_statement VARCHAR(2147483647);
DECLARE v_statement_map1 VARCHAR(2147483647) = '';
DECLARE v_statement_map2 VARCHAR(2147483647) = '';
DECLARE v_include_nulls BOOLEAN = include_nulls;
DECLARE a_unpivot_columns VARCHAR(127) array;
DECLARE a_unpivot_column_values VARCHAR(100) array;

a_unpivot_columns = LIB:split_to_array( :v_unpivot_columns, ',' );
a_unpivot_column_values = LIB:split_to_array( :v_unpivot_column_values, ',' );
v_count = cardinality(:a_unpivot_columns);
tbl_pivot_columns = UNNEST(:a_unpivot_columns) AS ("EXCLUDE_COLUMNS");

/*
if all columns needed, use * to get the column names except tbl_pivot_columns
*/
IF v_select_columns = '*'
THEN
SELECT string_agg(column_name,',' ORDER BY position) INTO v_select_columns FROM sys.columns 
WHERE table_name = v_table_name
AND schema_name = v_schema_name 
AND column_name NOT IN (SELECT EXCLUDE_COLUMNS FROM :tbl_pivot_columns );
end IF;

v_statement = 'select ' || v_select_columns || ', ' || 'map(element_number';

/*
generate the statement
*/
FOR v_idx IN 1.. v_count DO
v_statement_map1 = v_statement_map1 || ',' || v_idx  || ',''' || :a_unpivot_column_values[:v_idx] || '''' ;
v_statement_map2 = v_statement_map2 || ',' || v_idx  || ',' || :a_unpivot_columns[:v_idx]  ;
END FOR;
v_statement = v_statement || v_statement_map1 || ') as "'|| v_unpivot_col_name || '", map(element_number ' || v_statement_map2 || ') as "' || v_unpivot_val_name || '" from ' || v_table_name || ', SERIES_GENERATE_INTEGER(1,1,' || v_count+1 || ') order by element_number';

IF v_include_nulls = FALSE
THEN
v_statement = 'select * from (' || v_statement || ' ) where ' || v_unpivot_val_name || ' is not null';
END IF;

EXECUTE IMMEDIATE v_statement;

END;

The input parameters are:

Parameter name Description 
table_name Name of table, which should unpivot
schema_name   Name of schema, where the table is created 
select_columns   List of columns, which should display or * 
unpivot_val_name   Name of column for unpivot value columns 
unpivot_col_name   Name of column for unpivot columns 
unpivot_columns List of unpivot columns 
unpivot_column_values  Values for unpivot columns 
include_nulls If the null values should display, default TRUE. 

For example, the procedure can be called like this:

CALL P_UNPIVOT('TEST_B','SYSTEM','*','VAL','JAHR','SALES_YEAR_2015,SALES_YEAR_2016,SALES_YEAR_2017,SALES_YEAR_2018,SALES_YEAR_2019','2015,2016,2017,2018,2019',FALSE);
CALL P_UNPIVOT('TEST_B','SYSTEM','*','VAL','JAHR','SALES_YEAR_2015,SALES_YEAR_2016,SALES_YEAR_2017,SALES_YEAR_2018,SALES_YEAR_2019','2015,2016,2017,2018,2019');

Further steps


Because the created table structure is not known during compile time, no table typed output parameter can be used. To store the result for later usage, you can add the following code to the procedures to store the result into a dynamically generated table. In addition, you need to define the variable v_new_table_name, which defines the table to be used for storage. Please note, that the usage of dynamic executed DDL statements is not recommended:

if exists(select 1 from tables where table_name = v_new_table_name and schema_name = v_schema_name)
then
execute immediate 'drop table ' || v_new_table_name;
end if;

execute immediate 'create table ' || v_new_table_name || ' as ( ' || v_statement || ')';

The procedures using dynamic SQL to execute the generated SELECT statement.

No comments:

Post a Comment