My project literally starts from scratch. I have some experience in operating SAP BW systems and troubleshooting various performance issues, but that always was human experience applied to a specific technical problem. Now there is an (at least for me) completely new field: Applying Machine Learning to operations. During the last years I read a lot about the AI hype. AI was applied almost exclusively to business cases. This means to me stuff inside the SAP applications or some other business software layers. I haven’t really seen a case for trying to apply AI or Machine Learning below, in the ordinary operations. Sure, there is lots of automation in operations, but hardly anyone that I am aware of talks about AI or ML in operations. This makes it hard to use some best practices which might guide me.
Before I can dive into the Machine Learning part, I need to talk about data first. At least I can use my experience in operations to gather data which I believe to be relevant. The better the data quality and the more relevant the metrics, the better the performance of my Machine Learning models will be. My approach tries to follow the Data Science Pipeline from Professor Naumann of the HPI:
1. capture: data is captured by SAP BW in its database tables
2. extraction: data is extracted by me via SQL statements
3. curation: metrics are calculated by the very same SQL statements
4. storage: I collect my metrics in a new relational database
5. search: not required, the data volume is pretty small
6. sharing: not required, this is just a prototype
7. querying: done in Anaconda/Jupyter
8. analysis: done in Anaconda/Jupyter
9. visualization: done in Anaconda/Jupyter
Now if you remember my schematically view of the BW system, this was the guideline to identify BW-specific workload and performance metrics:
What data to collect?
Before I can dive into the Machine Learning part, I need to talk about data first. At least I can use my experience in operations to gather data which I believe to be relevant. The better the data quality and the more relevant the metrics, the better the performance of my Machine Learning models will be. My approach tries to follow the Data Science Pipeline from Professor Naumann of the HPI:
1. capture: data is captured by SAP BW in its database tables
2. extraction: data is extracted by me via SQL statements
3. curation: metrics are calculated by the very same SQL statements
4. storage: I collect my metrics in a new relational database
5. search: not required, the data volume is pretty small
6. sharing: not required, this is just a prototype
7. querying: done in Anaconda/Jupyter
8. analysis: done in Anaconda/Jupyter
9. visualization: done in Anaconda/Jupyter
Now if you remember my schematically view of the BW system, this was the guideline to identify BW-specific workload and performance metrics:
Figure 1: created by me
To start chronologically in this picture, let’s start on the top: I want to collect details on the InfoPackages and DTPs. Actually, they are about loading rows, the require some processing time and there are different types of InfoPackages. Since I don’t want to burden the productive systems with my data extraction, I decided to only collect data once a day and aggregated on days. For workload metrics this will be a daily sum and for performance metrics this will be a daily average. Now I have to give my metrics some fancy names. I get some inspiration by the column names of the base tables and try to use some prefix for each metric group.
Table RSDDSTATWHM provides the data on InfoPackages. My metrics are:
IP_INFOPAKIDS: number of InfoPackages loaded on this day
IP_SECONDS: overall processing time for these InfoPackages
IP_RECORDS: total number of rows loaded by these InfoPackages
IP_AVG: average processing time per row on that day
Table RSDDSTATDTP has got the equivalent data on DTPs, but with more details in the columns. Here my metrics are:
DTP_REQUESTS: number of DTP requests on that day
DTP_DATAPAKS: number of DTP data packages which have been loaded
DTP_SECONDS: overall processing time for these DTPs
DTP_RECORDS: number of rows loaded by these DTPs
DTP_GB: total DTP data volume in GB on that day
DTP_AVG: average processing time per row on that day
Table RSDDSTATINFO provides details on BW queries. Here we see. Please note that there is a huge difference between the first navigational step (STEPCNT=’0001′) of a BW query and all subsequent steps. The first navigational step can easily run for 10 minutes, even with a BWA in place. The next navigational steps within that query are way faster, typically faster than 1 second. So, my metrics also provide details on the initial navigational steps:
Q_NAVSTEPS: total number of navigational steps on that day
Q_INISTEPS: number of initial navigational steps on that day
Q_USERS: number of users who executed BW queries
Q_QUERIES: number of different BW queries which were executed
Q_RT_SUM: total sum of the runtime for all BW queries
Q_RT_AVG: average runtime of a navigational step
Q_RT_MED: medium runtime of a navigational step
Q_RT_INI: average runtime of the initial navigational steps
RSDDSTATBIAUSE is a table containing usage metrics of the BWA. For better or worse, I just collect them verbatim, sine I know of no other source for BWA usage data. The column names are relatively self-explaining:
BIA_USED, BIA_INA(ctive), DB_FALLBACK, BIA_NOTPOSS(ible), NO_BIA
Depending on the complexity of a BW query, many calls to the BWA could be triggered and registered in BIA_USED.
Then I collect 15 metrics for BWonOracle on the overall data volume and 7 metrics for BWonHANA. This is mainly because transaction DBACOCKPIT doesn’t keep a long history. Probably these metrics are not suitable for training models, but maybe I find some interesting correlations:
DV_SEGMENTS: total number of segments in the Oracle database
DV_TABLES: total number of tables in the Oracle database
DV_NUMROWS: total number of rows in these tables (according to the optimizer statistics)
DV_DSO_GB: size of all DSOs (tables+indexes) in GB
DV_IC_GB: size of all InfoCubes (tables+indexes) in GB
DV_PSA_GB: size of the PSA (tables+indexes) in GB
DV_ADM_GB: size of all RS* tables+indexes in GB
DV_TOTAL_PARTS: total number of table+index partitions in Oracle
DV_EMPTY_PARTS: allocated partitions without any rows in them
DV_COLD_PARTS: partitions which have not been accessed for 42 days
DV_WARM_PARTS: partitions which have been accessed in the last 42 days
DV_COLD_GB: size of all segments which have not been accessed for 42 days
DV_WARM_GB: size of all segments which have been accessed in the last 42 days
Also, I collect some standard SQL metrics, both on Oracle (DBA_HIST_SQLSTAT) and HANA (HOST_SQL_PLAN_CACHE). In some later blog post I’ll examine whether some significant differences can be found between BWonOracle versus BWonHANA.
SELECTS: total number of SELECT statements on that day
AVG_SELECT: average runtime of a SELECT statement
INSERTS: total number of INSERT statements on that day
AVG_INSERT: average runtime of an INSERT statement
UPDATES: total number of UPDATE statements on that day
AVG_UPDATE: average runtime of an UPDATE statement
DELETES: total number of DELETE statements on that day
AVG_DELETE: average runtime of a DELETE statement
CALLS: total number of database procedure callls (PL/SQL or SQLscript)
AVG_CALL: average runtime of a database procedure call
LOG_GB: amount of log information written on that day
For BWonOracle, I collect some more available metrics on the database workload from DBA_HIST_SYSSTAT:
LOGICAL_READS, PHYSICAL_READS, DB_BLOCK_CHANGES, TABLE_SCANS
Finally I collect some metrics on BW process chains (RSPCLOGCHAIN and RSPCPROCESSLOG), SAP batchjobs (TBTCO) and workprocess utilization (/SDF/SMON_WPINFO):
PC_CHAINS: number of different process chains which were executed
PC_RUNS: number of process chains which have run
PC_TOTAL: total runtime of all process chains on that day
PC_AVG: average runtime of a process chain
PC_MED: medium runtime of a process chain
PC_FAIL: number of failed process chains on that day
BTC_JOBS: total number of batchjobs which ran on that day
BTC_SUM: total runtime of all batchjobs on that day
BTC_AVG: average runtime of a batchjob
BTC_MED: medium runtime of a batchjob
BTC_FAIL: number of failed batchjobs on that day
WP_DIA_AVG: number of DIA workprocesses which were in use on average
WP_DIA_PCT: percentage of the DIA workprocesses which were in use on average
WP_DIA_MAX: maximum number of DIA workprocesses which were active simultaneously
WP_BTC_AVG: number of BTC workprocesses which were in use on average
WP_BTC_PCT: percentage of the BTC workprocesses which were in use on average
WP_BTC_MAX: maximum number of BTC workprocesses which were active simultaneously
So far for the relevant metrics which I decided to collect. There is a handful of additional metrics, but I collect them solely for operative purposes. I just want to know my SAP BW systems in great detail.
68 Metrics later
To quickly sum up the metrics: At the beginning I didn’t know what was relevant for later on, so I simply collected what looked interesting and was relatively easy to collect. My motto was: The more metrics, the better. If I had access to detailed data, I collected the average and the median value for performance metrics. If I had access only to preaggregated data I had to be satisfied with the average value. It was a somewhat tedious task, but I had to lay a solid foundation for the next steps. Anyway, I started my automated data collection and was eager to explore the treasures.
No comments:
Post a Comment