Here we are going through how we can use the Predictive Analytics functions provided via PAL in HANA in the Scripted Calculated views .
We are following the SAP provided SAP HANA Predictive Analytics Library ( PAL ) guide to do this.
PAL provides a vast business opportunity and immense business value to cater the needs of Analytics domain in projects implemented on SAP HANA platform.
[1] SAP HANA SPS 12
We can check this here:
Go to HANA System installed in HANA Studio/ Eclipse-> Configuration and Mapping -> Open Administration
We are following the SAP provided SAP HANA Predictive Analytics Library ( PAL ) guide to do this.
PAL provides a vast business opportunity and immense business value to cater the needs of Analytics domain in projects implemented on SAP HANA platform.
Pre-requisites :
[1] SAP HANA SPS 12
We can check this here:
Go to HANA System installed in HANA Studio/ Eclipse-> Configuration and Mapping -> Open Administration
[2] Application Function Library (AFL) is installed .
Check it by executing these queries :
SELECT * FROM "SYS"."AFL_AREAS" WHERE AREA_NAME = 'AFLPAL';
SELECT * FROM "SYS"."AFL_PACKAGES" WHERE AREA_NAME = 'AFLPAL';
SELECT * FROM "SYS"."AFL_FUNCTIONS" WHERE AREA_NAME = 'AFLPAL';
[3] Access is granted to the user for the below roles :
AFL__SYS_AFL_AFLPAL_EXECUTE
AFL__SYS_AFL_AFLPAL_EXECUTE_WITH_GRANT_OPTION
AFLPM_CREATOR_ERASER_EXECUTE
[4] Script Server is Active as seen in Administration Section below
Here are the steps we follow to apply a PAL algorithm on a business data in a scripted view in HANA Modeling :
In simplified sense , below is what we will be doing to apply the PAL functions on business data ; We are however not covering the UI work here:
Business Problem : Group different materials based on their mutual similarities in their physical attributes for a better material management
Steps in Detail:
[ In this blog , we created / configured all the needed table types that we define below and tables needed in the schema “VMARADA” ]
[1] Study the business problem and identify the appropriate statistical algorithm suitable to address the requirement .
[2] Go to SAP HANA Predictive Analytics Library ( PAL ) guide to know the programming syntax of the statistical algorithm
[3] Understand the specific input data needs for the PAL algorithm that we want to use and process the input business data we have got into that format before making it get consumed in the PAL procedure
[4] Create a procedure that wraps the PAL function . These are called Wrapper procedures as we are hiding the actual Statistical algorithm being used but just using its interface to consume our business data and output the expected statistical insight .
[5] Create a Scripted view where we Call this Wrapper procedure to apply on the business data we have to get the predictive insight on that data to get computed.
In this blog , we are seeing how Affinity Propagation algorithm of PAL can be leveraged to derive analytics out of data that is processed in SAP HANA .
What is Affinity Propagation ?
It is a form of Clustering technique which falls in Unsupervised learning category in Statistical Algorithms.
Affinity propagation simultaneously considers all data points as potential exemplars, exchanging real-valued messages between data points until a high-quality set of exemplars and corresponding clusters gradually emerges by running multiple iterations.
In this example, we are taking a data-set on Material with some of the associated physical parameters of Materials like Strength, Stiffness, Hardness, and Ductility etc. for our understanding on using PAL in a scripted view.
This data-set can be via. any HANA Model that is built or from a table.
There are these below pre-requisites to use AF algorithm of PAL on a data-set :
=> No missing or null data in the inputs.
=> The data is numeric, not categorical.
If the initial data-set is not in this format , we need to process the data of the original data-set in our modeling to get it into this format.
In this example data-set we have below , we are going to define the row number on Material Id to get a unique row number to every record. This will be done in the scripted view that is there later in this blog.
This row number will later help us to link back with original record of data set .
We will represent the data of the data-set to PAL Algorithm on Affinity Propagation via this row number as row number is a numeric form of data.
Below steps [A] , [B] and [C] in sequence are to write the code :
[A] CREATION OF THE NEEDED TABLES/TABLE TYPES
To use the PAL function on this data-set , we have to create few table types .
These are as follows :
[A][1] Source data :
This is the initial data set on which we are going to apply the PAL algorithm .
We don’t need to create this table/ table type if we already have it via. any other table / HANA model for us.
For our example , I am creating a table for this and then loading it with data via. Flat File
/* SOURCE DATA */
CREATE COLUMN TABLE "VMARADA"."PAL_AP_DATA_MATID" (
"MANDT" NVARCHAR(3) ,
"MATID" NVARCHAR(8),
"STRENGTH" DOUBLE,
"STIFFNESS" DOUBLE,
"HARDNESS" DOUBLE,
"DUCTILITY" DOUBLE
);"
PFA of the file “PAL_input.csv” where we have the input data .
For our example , We loaded it into this new table we created with the above code.
[A][2] /* SIGNATURE TABLE FOR THE PREDICTIVE ANALYTICS METHOD*/
This table is needed for the PAL function to work on the input data
[A][2][i] Create the table
CREATE COLUMN TABLE "VMARADA".PAL_MATID_AP_SIGNATURE (
"POSITION" INT,
"SCHEMA_NAME" NVARCHAR(256),
"TYPE_NAME" NVARCHAR(256),
"PARAMETER_TYPE" VARCHAR(7)
);
[A][2][ii]Insert below records into this table:
INSERT INTO "VMARADA".PAL_MATID_AP_SIGNATURE VALUES (1,'VMARADA','AP_MATID_SOURCE_TYPE', 'IN');
INSERT INTO "VMARADA".PAL_MATID_AP_SIGNATURE VALUES (2,'VMARADA','AP_MATID_SEED_TYPE', 'IN');
INSERT INTO "VMARADA".PAL_MATID_AP_SIGNATURE VALUES (3,'VMARADA','PAL_CONTROL_T', 'IN');
INSERT INTO "VMARADA".PAL_MATID_AP_SIGNATURE VALUES (4,'VMARADA','AP_MATID_RESULT_TYPE', 'OUT');
[A][3] TABLE TYPES : These are all used in the Wrapper procedure we are going to create for PAL application and in the Scripted code that we call this wrapper procedure to apply on the initial data set:
/* [A][3][1] SOURCE DATA TABLE TYPE */
CREATE TYPE "VMARADA".AP_MATID_SOURCE_TYPE AS TABLE (
ID INTEGER,
STRENGTH DOUBLE,
STIFFNESS DOUBLE ,
HARDNESS DOUBLE,
DUCTILITY DOUBLE
);
/* [A][3][2] Input Seed table Type */
CREATE TYPE "VMARADA".AP_MATID_SEED_TYPE AS TABLE (
ID INTEGER,
SEED INTEGER
);
/*[A][3][3] Result data table type */
CREATE TYPE "VMARADA".AP_MATID_RESULT_TYPE AS TABLE(
ID INTEGER,
RESULT INTEGER
);
/*[A][3][4]Control Table type*/
CREATE TYPE "VMARADA".PAL_CONTROL_T AS TABLE(
NAME VARCHAR(50),
INTARGS INTEGER,
DOUBLEARGS DOUBLE,
STRINGARGS VARCHAR(100)
);
[B] Creating the wrapper procedure to apply PAL
CALL SYS.AFLLANG_WRAPPER_PROCEDURE_CREATE('AFLPAL', 'AP', 'VMARADA', 'PAL_AP_MATID',
"VMARADA".PAL_MATID_AP_SIGNATURE);
[C] Creating the Scripted view to process the initial data set to the needed format , calling PAL and outputting the insight derived
HINT : After the script is written , mark the type of the output field CLUSTER in the Semantics section as Attribute as shown below.
Below is the code in the scripted view :
/********* Begin Procedure Script ************/
BEGIN
DECLARE vc "VMARADA".PAL_CONTROL_T ;
DECLARE vs "VMARADA".AP_MATID_SOURCE_TYPE ;
DECLARE vseed "VMARADA".AP_MATID_SEED_TYPE ;
DECLARE vr "VMARADA".AP_MATID_RESULT_TYPE ;
/* COntrol table */
vc = SELECT 'THREAD_NUMBER' AS NAME,
2 AS INTARGS ,
null AS DOUBLEARGS ,
null AS STRINGARGS
FROM DUMMY
UNION ALL
SELECT 'MAX_ITERATION' AS NAME,
500 AS INTARGS ,
null AS DOUBLEARGS ,
null AS STRINGARGS
FROM DUMMY
UNION ALL
SELECT 'CON_ITERATION' AS NAME,
100 AS INTARGS ,
null AS DOUBLEARGS ,
null AS STRINGARGS
FROM DUMMY
UNION ALL
SELECT 'DAMP' AS NAME,
null AS INTARGS ,
0.9 AS DOUBLEARGS ,
null AS STRINGARGS
FROM DUMMY
UNION ALL
SELECT 'PREFERENCE' AS NAME,
null AS INTARGS ,
0.5 AS DOUBLEARGS ,
null AS STRINGARGS
FROM DUMMY
UNION ALL
SELECT 'DISTANCE_METHOD' AS NAME,
2 AS INTARGS ,
null AS DOUBLEARGS ,
null AS STRINGARGS
FROM DUMMY
UNION ALL
SELECT 'CLUSTER_NUMBER' AS NAME,
0 AS INTARGS ,
null AS DOUBLEARGS ,
null AS STRINGARGS
FROM DUMMY;
/* Source Table */
vs0 = SELECT
ROW_NUMBER() over ( order by MATID ) as ID,
MANDT ,
MATID ,
STRENGTH ,
STIFFNESS ,
HARDNESS ,
DUCTILITY
FROM "VMARADA"."PAL_AP_DATA_MATID" ;
vs = SELECT ID ,
STRENGTH ,
STIFFNESS ,
HARDNESS ,
DUCTILITY
FROM :vs0 ;
/* Seed table */
vseed = SELECT 0 as ID ,
0 as SEED
FROM DUMMY ;
vseed = SELECT * FROM :vseed where ID > 0 ;
/* PAL Procedure call */
CALL "VMARADA"."PAL_AP_MATID"(:vs, :vseed, :vc,:vr) ;
var_out = SELECT a.MANDT ,
a.MATID ,
b.RESULT + 1 as CLUSTER ,
a.STRENGTH ,
a.STIFFNESS ,
a.HARDNESS ,
a.DUCTILITY
FROM
:vs0 a
inner join
:vr b on
a.ID = b.ID ;
var_out = SELECT * FROM :var_out order by to_integer(CLUSTER) asc ;
END /********* End Procedure Script ************/
The final output of this scripted view will be as follows where we have Materials clustered into groups based on their affinity with each-other with respect to the physical parameters like Strength, Stiffness, Hardness, and Ductility etc as seen in the attached file “PAL_OUTPUT.csv”.
Below is a brief screen-shot of this output.
Based on the Column ‘CLUSTER’ here , we can get the grouping of different materials.
The different materials based on their mutual similarities in their physical attributes get clustered into different clusters.
This can further get consumed in UI for a more interactive insight for further understanding and analysis .
PFA of all the sql syntax and scripted view created .[ Attachment = “PAL_MATNR_AP.txt” ]
No comments:
Post a Comment