In this blog, I am providing step by step details about how to perform table redistribution and repartitioning in a BW on HANA system. Several steps described in this document are specific only to a BW on HANA system.
In a scaled-out HANA system, tables and table partitions are spread across several hosts. The tables are distributed initially as part of the installation. Over the time when more and more data gets pumped into the system, the distribution may get distorted with some hosts holding very large amount of data whereas some other hosts holding much lesser amount of data. This leads to higher resource utilization in the overloaded hosts and may lead to various problems on those like higher CPU utilization, frequent out of memory dumps, increased redo log generation which in turn can cause problems in system replication to the DR site. If the rate of redo logs generation in the overloaded host is higher compared to the rate of transferring the logs to the DR site, it increases the buffer full counts and puts pressure on the replication network between the primary nodes and the corresponding nodes in secondary.
The table redistribution and repartitioning operation applies advanced algorithms to ensure tables are distributed optimally across all the active nodes and are partitioned appropriately taking into consideration several parameters mainly –
➢ Number of partitions
➢ Memory Usage of tables and partitions
➢ Number of rows of tables and partitions
➢ Table classification
Apart from these, there are several other parameters that are considered by the internal HANA algorithms to execute the redistribution and repartitioning task.
You can consider the following to decide whether a redistribution and repartitioning is needed in the system.
➢ Execute the SQL script “HANA_Tables_ColumnStore_TableHostMapping” from OSS note 1969700. From the output of this query, you can see the total size of the tables in disk across all the hosts. In our system, as you can see in the screen shot below the distribution was uneven with some nodes holding too much data compared to other nodes.
In a scaled-out HANA system, tables and table partitions are spread across several hosts. The tables are distributed initially as part of the installation. Over the time when more and more data gets pumped into the system, the distribution may get distorted with some hosts holding very large amount of data whereas some other hosts holding much lesser amount of data. This leads to higher resource utilization in the overloaded hosts and may lead to various problems on those like higher CPU utilization, frequent out of memory dumps, increased redo log generation which in turn can cause problems in system replication to the DR site. If the rate of redo logs generation in the overloaded host is higher compared to the rate of transferring the logs to the DR site, it increases the buffer full counts and puts pressure on the replication network between the primary nodes and the corresponding nodes in secondary.
The table redistribution and repartitioning operation applies advanced algorithms to ensure tables are distributed optimally across all the active nodes and are partitioned appropriately taking into consideration several parameters mainly –
➢ Number of partitions
➢ Memory Usage of tables and partitions
➢ Number of rows of tables and partitions
➢ Table classification
Apart from these, there are several other parameters that are considered by the internal HANA algorithms to execute the redistribution and repartitioning task.
DETERMINING WHETHER A REDISTRIBUTION AND REPARTITIONING IS NEEDED
You can consider the following to decide whether a redistribution and repartitioning is needed in the system.
➢ Execute the SQL script “HANA_Tables_ColumnStore_TableHostMapping” from OSS note 1969700. From the output of this query, you can see the total size of the tables in disk across all the hosts. In our system, as you can see in the screen shot below the distribution was uneven with some nodes holding too much data compared to other nodes.
If you observe frequent out of memory dumps are getting generated in few hosts due to high memory utilization of those hosts by column store tables. You can execute the following SQL statement to see the memory space occupied by the column store tables.
select host, count(*), round(sum(memory_size_in_total/1024/1024/1024)) as size_GB from m_cs_tables group by host order by host
As you can see in the below screen shot, in some hosts the memory space occupied by the column stores tables is much higher compared to other hosts.
If new hosts are added to the system, tables are not automatically distributed to those. Only new tables created after the addition of hosts may get stored there. The optimize table distribution activity needs to be carried out to distribute tables from the existing hosts to the new hosts.
➢ If too many of the following error message are showing up in the indexserver trace file, the table redistribution and repartitioning activity also takes of these issues.
Potential performance problem: Table ABC and XYZ are split by unfavorable criteria. This will not prevent the data store from working but it may significantly degrade performance.
Potential performance problem: Table ABC is split and table XYZ is split by an appropriate criterion but corresponding parts are located in different servers. This will not prevent the data store from working but it may significantly degrade performance.
CHECKLIST
✓ Update table “table_placement”
✓ Maintain parameters
✓ Grant permissions to SAP schema user
✓ Run consistency check report
✓ Run stored procedure to check native HANA table consistency and catalog
✓ Run the cleanup python script to clean virtual files
✓ Check whether there are any business tables created in row store. Convert them to column store
✓ Run the memorysizing python script
✓ Take a database backup
✓ Suspend crontab jobs
✓ Save the current table distribution
✓ Increase the number of threads for the execution of table redistribution
✓ Unregister secondary system from primary (if DR is setup)
✓ Stop SAP application
✓ Lock users
✓ Execute “optimize table distribution” operation
✓ Startup SAP
✓ Run compression of tables
STEPS IN DETAILS
Update table “table_placement”
OSS note 1908075 provides an attachment which has several scripts for different HANA versions and different scale out scenarios. Download the attachment and navigate to the folder as per your HANA version, number of slave nodes and amount of memory per node.
In the SQL script, replace the $$PLACEHOLDER with the SAP schema name of your system. Execute the script. This will update the table “Table Placement” under SYS schema. This table will be referred by HANA algorithms to take decisions on the table redistribution and repartitioning activity.
Maintain parameters
Maintain HANA parameters as recommended in OSS note 1958216 according to your HANA version.
Grant permissions to SAP schema user
For HANA 1.0 SPS10 onwards, ensure that the SAP schema user (SAPBIW in our case) has the system privilege “Table Admin”.
Run consistency check report
SAP provides an ABAP report “rsdu_table_consistency” specifically for SAP systems on HANA database. First, ensure that you apply the latest version of this report and apply the OSS note 2175148 – SHDB: Regard TABLE_PLACEMENT in schema SYS (HANA SP100) if your HANA version is >=SPS10. Otherwise you may get short dumps while executing this report if you select the option “CL_SCEN_TAB_CLASSIFICATION”.
Execute this report from SA38 especially by selecting the options “CL_SCEN_PARTITION_SPEC” and “CL_SCEN_TAB_CLASSIFICATION”. (You can select all the other options as well). If any errors are reported, fix those by running the report in repair mode.
Note: This report should be run after the table “table_classification” is maintained as described in the first step. This report refers to that table to determine and fix errors related to table classification.
Run stored procedure to check native HANA table consistency and catalog
Execute the stored procedures check_table_consistency and check_catalog for the non-BW tables and ensure there are critical errors reported. If any critical errors are reported, fix those first.
Run the cleanup python script to clean extra virtual files
If there are extra virtual files, the table redistribution and repartitioning operation may fail. Run the python script cleanupExtraFiles.py available in the python_support directory to determine whether there are any extra virtual files. Before you run the script, open it in VI editor and modify the following parameters as per your system.
self.host, self.port, self.user and self.passwd
Execute the following command first to determine whether there are any extra virtual files.
Python cleanupExtraFiles.py
If this command reports extra virtual files, execute the command again with remove option to cleanup those.
Python cleanupExtraFiles.py –removeAll
Check whether there are any business tables created in row store
The table redistribution and repartitioning operation considers only column store tables and not row store tables. So, if anybody has created any business tables in row store (by mistake or without knowing the implications) those will not get considered for this activity. Big BW tables are not supposed to be created in row store in the first place. Convert those to column store using the below SQL
Alter table <table_name> column;
Run the memorysizing python script
Before running the actual “optimize table distribution” task, execute the below command –
Call reorg_generate(6,’’)
This will generate the redistribution plan but not execute it. The number ‘6’ here is the algorithm id of “Balance landscape/Table” which gets executed by the “optimize table distribution” operation.
After the above procedure gets executed, in the same SQL console, execute the following query-
Create table REORG_LOCATIONS_EXPORT as (select * from #REORG_LOCATIONS)
This will create the table REORG_LOCATIONS_EXPORT in the schema of the user with which you executed this. Execute the query – Select memory_used from reorg_locations_export
If you see the memory_used column has several negative numbers like shown in the screen shot below, it indicates there is a problem.
You can also execute the below query and check the output.
select host, round(sum(memory_used/1024/1024/1024)) as memory_used from reorg_locations_export group by host order by host
If you get the output as shown in the below screen shot, this indicates that the memory statistics is not updated. If you execute the “optimize table distribution” operation now, the distribution won’t be even and some hosts may end up having far larger number of tables with high memory usage and whereas others will have very less number of tables and very less memory usage.
This is due to a HANA internal issue which has been fixed in HANA 2.0 where an internal housekeeping algorithm corrects these memory statistics. As a workaround for HANA 1.0 systems, SAP provides a python script that you can find in the standard python_support directory. (Check OSS note 1698281 for more details about this script).
Note: This script should be executed during low system load (preferably on weekends).
After the script run finishes, generate a new plan with the same method as described above and create a new table for reorg locations with a new name, say reorg_locations_export_1. Execute the query – Select memory_used from reorg_locations_export_1. Now you won’t be seeing
those negative numbers in the memory_used column. Executing the query – select host, round(sum(memory_used/1024/1024/1024)) as memory_used from reorg_locations_export_1 group by host order by host, will now show much better result. As you can see below, the values in the memory_used column is pretty much even across all nodes after executing the memorysizing script and there are no negative values.
Take a database backup
Take a database backup before executing the redistribution activity.
Suspend crontab jobs
Suspend jobs that you have scheduled in crontab, e.g. backup script.
Save the current table distribution
From HANA studio, go to landscape –> Redistribution. Click on save. This will generate the current distribution plan and save it in case it is needed to restore back to the original table distribution.
Increase the number of threads for the execution of table redistribution
For faster execution of the table redistribution and repartitioning operation, you can set the parameter indexserver.ini [table_redist] -> num_exec_threads to a high value like 100 or 200 based on the CPU capacity of your HANA system. This will increase the parallelization and speed up the operation. The value should not exceed the number of logical CPU cores of the hosts. The default value of this parameter is 20. Make sure to unset this parameter after the activity is completed.
Unregister secondary system from primary (if DR is setup)
If you have system replication setup between primary and secondary sites, you will need to unregister secondary from primary. If you perform table redistribution and repartitioning with system replication enabled, it will slow down the activity.
Stop SAP application
Stop SAP application servers
Lock users
Lock all users in HANA except the system users SYS, _SYS_REPO, SYSTEM, SAP schema user, etc. Ensure there are no sessions active in the database before you proceed to the next step. This will also ensure that SLT replication cannot happen. Still if you want you can deactivate SLT replication separately.
Execute “optimize table distribution” operation
From HANA studio, go to landscape –> Redistribution. Select “Optimize table distribution” and click on execute. In the next screen under “Parameters” leave the field blank. This will ensure that table repartitioning will also be taken care of along with redistribution. However, if you want to run only redistribution without repartitioning, enter “NO_SPLIT” in the parameters field. Click on next to generate the reorg plan and then click on execute.
Monitor the redistribution and repartitioning operation
Run the SQL script “HANA_Redistribution_ReorganizationMonitor” from OSS note 1969700 to monitor the redistribution and repartitioning activity. You can also execute the below command to monitor the reorg steps
select IFNULL(“STATUS”, ‘PENDING’), count(*) from REORG_STEPS where reorg_id=(SELECT
MAX(REORG_ID) from REORG_OVERVIEW) group by “STATUS”;
Startup SAP
Start SAP application servers after the redistribution completes.
Run compression of tables
The changes in the partition specification of the tables as part of this activity leads to tables in uncompressed form. Though the “optimize table distribution” process carries out compression as part of this activity, due to some bug tables can still be in uncompressed form after this activity completes. This will lead to high memory usage. Compression will run automatically after the next delta merge happens on these tables. If you want you can perform it manually. Execute the SQL script “HANA_Tables_ColumnStore_TablesWithoutCompressionOptimization” and HANA_Tables_ColumnStore_ColumnsWithoutCompressionOptimization” from OSS note 1969700 to get the list of tables and columns that need compression. The output of this script provides the SQL query for executing the compression.
Result
As an outcome of this activity, the distribution of tables evened out across all the hosts of the system. The memory space occupied by column store tables also became more or less even. Also you can see that the size of tables in the master node has reduced. This is because some of our BW projects had created big tables in the master node which have been moved to the slave nodes as part of this redistribution activity. This should be the ideal scenario.
Size of tables on disk (before)
Size of tables on disk (after)
Count and memory consumption of tables (before)
Count and memory consumption of tables (after)
I hope this article helps you all whoever is planning to perform this activity on your BW on HANA system.
No comments:
Post a Comment