We already covered SHAP-explained models for classification and regression scenarios in a previous APL blog post, and at the time we talked briefly about the main effect of a predictor and its interaction effect with the other predictors of the model. Now with HANA ML 2.17, you have the ability to visualize the interaction between variables in a heatmap. This new visualization adds to the bar chart Variable Importance in providing a global explanation of the classification/regression model. To get that new feature you need APL 2311 or a later version.
from hana_ml import dataframe as hd
conn = hd.ConnectionContext(userkey='MLMDA_KEY')
sql_cmd = 'SELECT * FROM "APL_SAMPLES"."CENSUS" ORDER BY 1'
hdf_train = hd.DataFrame(conn, sql_cmd)
First, we train a gradient boosting classification model with the interaction parameter set to true:
from hana_ml.algorithms.apl.gradient_boosting_classification import GradientBoostingBinaryClassifier
apl_model = GradientBoostingBinaryClassifier(variable_auto_selection=True,
interactions=True)
apl_model.fit(hdf_train, label='class', key='id')
When the model training is completed, we ask for the report:
from hana_ml.visualizers.unified_report import UnifiedReport
UnifiedReport(apl_model).build().display()
You may want to generate the report as an HTML file:
apl_model.generate_html_report('APL_Census')
The usual “Variable Importance” tab provides a global explanation of the predictive model.
But because we explicitly requested the interactions when setting the model parameters, a new tab “Interaction Matrix” appears at the end:
On the diagonal is the main effect of each variable. The interaction matrix presents only the variables with the highest interactions. By default, it is limited to a size of 6×6. For a larger matrix, 9×9 for example, we must specify a maximum number as follows:
apl_model = GradientBoostingBinaryClassifier(variable_auto_selection=True,
interactions=True,
interactions_max_kept=8)
apl_model.fit(hdf_train, label='class', key='id')
The larger the matrix, the longer it takes to fit the model.
If needed, one can obtain the interaction values in a pandas dataframe:
df = apl_model.get_debrief_report('ClassificationRegression_InteractionMatrix').deselect('Oid').collect()
df.style.hide(axis='index')
No comments:
Post a Comment