If you are a user of APL time series, you probably have seen models fitting a linear trend or a quadratic trend to your data. With version 2113 the Automated Predictive Library introduces an additional method called Piecewise Linear that can detect breakpoints in your series. You don’t have to do anything new to take advantage of this functionality, the trend is detected automatically as shown in the example below.
For SAP Analytics Cloud users, note that Piecewise Linear Trend is coming with the 2021.Q3 QRC (August release).
This article presents two ways of using APL: Python notebook, and SQL script. Let’s start with Python.
First, we connect to the HANA database.
# Connect using the HANA secure user store
from hana_ml import dataframe as hd
conn = hd.ConnectionContext(userkey='MLMDA_KEY')
You may want to check that the Automated Predictive Library on your HANA server is recent enough.
import hana_ml.algorithms.apl.apl_base as apl_base
df = apl_base.get_apl_version(conn)
v = df[df.name.eq('APL.Version.ServicePack')].iloc[0]['value']
print('APL Version is ' + v)
Don’t forget to sort the series over time before giving it to APL.
sql_cmd = 'SELECT * FROM "APL_SAMPLES"."BOSTON_MARATHON" ORDER BY "THE_DATE"'
hdf_in = hd.DataFrame(conn, sql_cmd)
df = hdf_in.collect()
This is how the series looks like.
import matplotlib.pyplot as plt
plt.figure(figsize=(12,5))
plt.plot(df.THE_DATE, df.WINNING_TIME)
plt.title('Boston Marathon Winning Time')
plt.xlabel('Year')
plt.ylabel('Men Times in minutes')
plt.grid()
plt.show()
We ask APL to build a time series model and make a forecast 3 years ahead.
from hana_ml.algorithms.apl.time_series import AutoTimeSeries
model = AutoTimeSeries(time_column_name= 'THE_DATE', target= 'WINNING_TIME', horizon= 3)
hdf_out = model.fit_predict(hdf_in)
And then we display the forecasted values.
df = hdf_out.collect()
import pandas as pd
df['THE_DATE'] = pd.to_datetime(df['THE_DATE'])
df = df.set_index('THE_DATE')
plt.figure(figsize=(12,5))
ax1 = df.ACTUAL.plot(color='royalblue', label='Actual')
ax2 = df.PREDICTED.plot(color='darkorange', label='Forecast', linestyle='dashed')
h1, l1 = ax1.get_legend_handles_labels()
plt.legend(h1, l1, loc=1)
plt.title('Boston Marathon Winning Time')
plt.xlabel('Year')
plt.ylabel('Men Times in minutes')
plt.grid()
plt.show()
The forecast line shows a piecewise trend with two breakpoints. This is confirmed by the components information.
d = model.get_model_components()
components_df = pd.DataFrame(list(d.items()), columns=["Component", "Value"])
components_df.style.hide_index()
We display some forecasting accuracy indicators from our APL model.
import numpy as np
d = model.get_performance_metrics()
# Average each indicator across the horizon time window
apm = []
for k, v in d.items():
apm.append((k, np.mean(v)))
metric = [apm for apm in apm if apm[0] =='L1'][0][1]
print("MAE is {:0.3f}".format(metric))
metric = [apm for apm in apm if apm[0] =='MAPE'][0][1] *100
print("MAPE is {:0.2f}%".format(metric))
metric = [apm for apm in apm if apm[0] =='L2'][0][1]
print("RMSE is {:0.3f}".format(metric))
We are done with our Python notebook. We said we will run the same example using SQL.
SQL code to check what the version of APL is.
call SAP_PA_APL."sap.pa.apl.base::PING"(?)
Sample SQL code to build the forecasting model and query some of the output tables.
--- Input Series
create view TS_SORTED as select * from APL_SAMPLES.BOSTON_MARATHON order by "THE_DATE" asc;
--- Output Series
create local temporary table #FORECAST_OUT (
"THE_DATE" DATE,
"WINNING_TIME" DOUBLE,
"kts_1" DOUBLE );
DO BEGIN
declare header "SAP_PA_APL"."sap.pa.apl.base::BASE.T.FUNCTION_HEADER";
declare config "SAP_PA_APL"."sap.pa.apl.base::BASE.T.OPERATION_CONFIG_EXTENDED";
declare var_desc "SAP_PA_APL"."sap.pa.apl.base::BASE.T.VARIABLE_DESC_OID";
declare var_role "SAP_PA_APL"."sap.pa.apl.base::BASE.T.VARIABLE_ROLES_WITH_COMPOSITES_OID";
declare out_log "SAP_PA_APL"."sap.pa.apl.base::BASE.T.OPERATION_LOG";
declare out_sum "SAP_PA_APL"."sap.pa.apl.base::BASE.T.SUMMARY";
declare out_indic "SAP_PA_APL"."sap.pa.apl.base::BASE.T.INDICATORS";
declare out_debrief_metric "SAP_PA_APL"."sap.pa.apl.base::BASE.T.DEBRIEF_METRIC_OID";
declare out_debrief_property "SAP_PA_APL"."sap.pa.apl.base::BASE.T.DEBRIEF_PROPERTY_OID";
:header.insert(('Oid', 'Marathon'));
:config.insert(('APL/Horizon', '3',null));
:config.insert(('APL/TimePointColumnName', 'THE_DATE',null));
:var_role.insert(('THE_DATE', 'input',null,null,'Marathon'));
:var_role.insert(('WINNING_TIME', 'target',null,null,'Marathon'));
"SAP_PA_APL"."sap.pa.apl.base::FORECAST_AND_DEBRIEF"(:header,:config,:var_desc,:var_role,'USER_APL','TS_SORTED', 'USER_APL','#FORECAST_OUT',out_log,out_sum,out_indic,out_debrief_metric,out_debrief_property);
select * from "SAP_PA_APL"."sap.pa.apl.debrief.report::TimeSeries_Components"(:out_debrief_property, :out_debrief_metric);
select * from "SAP_PA_APL"."sap.pa.apl.debrief.report::TimeSeries_Performance"(:out_debrief_property, :out_debrief_metric) where "Partition" = 'Validation';
END;
select THE_DATE as "Date", WINNING_TIME as "Actual", round("kts_1",3) as "Forecast"
from #FORECAST_OUT
order by 1;
No comments:
Post a Comment