Saturday, 24 June 2023

SHAP Interaction values with Automated Predictive (APL)

We already covered SHAP-explained models for classification and regression scenarios in a previous APL blog post, and at the time we talked briefly about the main effect of a predictor and its interaction effect with the other predictors of the model. Now with HANA ML 2.17, you have the ability to visualize the interaction between variables in a heatmap. This new visualization adds to the bar chart Variable Importance in providing a global explanation of the classification/regression model. To get that new feature you need APL 2311 or a later version.

This blog will walk you through an example using the Census dataset that comes with APL.

from hana_ml import dataframe as hd
conn = hd.ConnectionContext(userkey='MLMDA_KEY')
sql_cmd = 'SELECT * FROM "APL_SAMPLES"."CENSUS" ORDER BY 1'
hdf_train = hd.DataFrame(conn, sql_cmd)

First, we train a gradient boosting classification model with the interaction parameter set to true:

from hana_ml.algorithms.apl.gradient_boosting_classification import GradientBoostingBinaryClassifier
apl_model = GradientBoostingBinaryClassifier(variable_auto_selection=True, 
                                             interactions=True)
apl_model.fit(hdf_train, label='class', key='id')

When the model training is completed, we ask for the report:

from hana_ml.visualizers.unified_report import UnifiedReport
UnifiedReport(apl_model).build().display()

You may want to generate the report as an HTML file:

apl_model.generate_html_report('APL_Census')

The usual “Variable Importance” tab provides a global explanation of the predictive model.

SAP HANA, SAP HANA Career, SAP HANA Jobs, SAP HANA Prep, SAP HANA Preparation, SAP HANA Tutorial and Materials, SAP HANA Certification

But because we explicitly requested the interactions when setting the model parameters, a new tab “Interaction Matrix” appears at the end:

SAP HANA, SAP HANA Career, SAP HANA Jobs, SAP HANA Prep, SAP HANA Preparation, SAP HANA Tutorial and Materials, SAP HANA Certification

On the diagonal is the main effect of each variable. The interaction matrix presents only the variables with the highest interactions. By default, it is limited to a size of 6×6. For a larger matrix, 9×9 for example, we must specify a maximum number as follows:

apl_model = GradientBoostingBinaryClassifier(variable_auto_selection=True, 
                                             interactions=True, 
                                             interactions_max_kept=8)
apl_model.fit(hdf_train, label='class', key='id')

SAP HANA, SAP HANA Career, SAP HANA Jobs, SAP HANA Prep, SAP HANA Preparation, SAP HANA Tutorial and Materials, SAP HANA Certification

The larger the matrix, the longer it takes to fit the model.

If needed, one can obtain the interaction values in a pandas dataframe:

df = apl_model.get_debrief_report('ClassificationRegression_InteractionMatrix').deselect('Oid').collect()
df.style.hide(axis='index')

SAP HANA, SAP HANA Career, SAP HANA Jobs, SAP HANA Prep, SAP HANA Preparation, SAP HANA Tutorial and Materials, SAP HANA Certification

No comments:

Post a Comment