One of the biggest challenges with SAP HANA databases are the duration of the startups. Here we have to separate the available time and the reload time.
In my career as consultant I’ve seen a lot of HANA databases with pretty long startup times with more than 1 hour. This is also the reason why SAP introduced the fast restart option with SPS4. But which possibilities are there to speed up which phase with which release? Because a lot of DBs are still not on SPS4 and can not profit from new features. How you can identify the bottleneck?
To understand which phase is your pain point, we have to understand how the DB starts and which phase can be optimized.
In any case we assume that the DB was stopped normally which includes a global save point at the end of the shutdown phase.
1. Open the data files
2. load converter tables from the last savepoint (mapping of logical pages to physical pages in the data file)
3. load the list of open transactions from the last savepoint
4. load RS tables
5. replay redo log entries
6. Roll back uncommited transactions
7. Perform global savepoint
To give you an example I setup a test environment with a 8TB HANA DB with HANA 2.0 SPS3 Rev. 36.
Hint: To find the begin of a start phase search for “==== Starting hdbindexserver“.
#as sidadm
cdtrace
grep "==== Starting hdbindex" *
view index*.trc
=> RowStore consistency check which can be affected with parameter consistency_check_at_startup:
“Configure row store consistency check during SAP HANA startup: This parameter is used to configure a row store consistency check via CHECK_TABLE_CONSISTENCY during SAP HANA startup. It’s a list parameter and the allowed values are ‘table’, ‘page’ and ‘index’, which perform consistency checks on ‘table’, ‘page’ and ‘index’ respectively. This consistency check can be disabled by setting the parameter value to ‘none’.”
TRexApiSystem.cpp / TableReload
In my career as consultant I’ve seen a lot of HANA databases with pretty long startup times with more than 1 hour. This is also the reason why SAP introduced the fast restart option with SPS4. But which possibilities are there to speed up which phase with which release? Because a lot of DBs are still not on SPS4 and can not profit from new features. How you can identify the bottleneck?
To understand which phase is your pain point, we have to understand how the DB starts and which phase can be optimized.
In any case we assume that the DB was stopped normally which includes a global save point at the end of the shutdown phase.
1. Open the data files
2. load converter tables from the last savepoint (mapping of logical pages to physical pages in the data file)
3. load the list of open transactions from the last savepoint
4. load RS tables
5. replay redo log entries
6. Roll back uncommited transactions
7. Perform global savepoint
Phases in detail
To give you an example I setup a test environment with a 8TB HANA DB with HANA 2.0 SPS3 Rev. 36.
Hint: To find the begin of a start phase search for “==== Starting hdbindexserver“.
#as sidadm
cdtrace
grep "==== Starting hdbindex" *
view index*.trc
Phase | Category | Description | optimize tasks | Start | End | Duration |
PersistentSpaceImpl.cpp | PersistenceManager | load Converters | I/O performance | 09:24:10 | 09:24:53 | 00:00:43 |
FileIDMapping.cpp | ContainerDirectory | Load Container / FileID Mapping | I/O performance / number of containers | 09:25:27 | 09:31:20 | 00:05:53 |
VirtualFileStatsProxy.cpp | PMRestart | LOB Owner statistics from Virtual File | optimize LOB size or type (packed LOBs) | 09:31:25 | 10:02:30 | 00:31:05 |
AbsolutePageAccessImpl | RowStorePageAccess | Collecting RowStore Pages | reduce RS size (reorg) | 10:02:36 | 10:03:50 | 00:01:14 |
CheckpointMgr.cc | Service_Startup | RowStore Load | reduce RS size (reorg) | 10:02:36 | 10:04:47 | 00:02:11 |
RecoveryHandlerImp.cpp | Logger | Log recovery | I/O performance / number of open transactions | 10:04:55 | 10:04:57 | 00:00:02 |
PersistenceManagerImpl.cpp | PersistenceManager | PersistenceManager Status (only seen over 6TB) |
reduce DB size / optimize I/O performance | 10:04:57 | 10:09:17 | 00:04:20 |
IntegrityCheckerTimer.h | Service_Startup | Consistency Check RowStore | reduce RS size (reorg) or disable indexserver.ini -> [row_engine] -> consistency_check_at_startup -> none |
10:09:18 | 10:14:14 | 00:04:56 |
TRexApiSystem.cpp | TableReload | ColumnStore Table Reload | tables_preloaded_in_parallel > 5 | 10:14:33 | 10:32:01 | 00:17:28 |
PersistentSpaceImpl.cpp / PersistenceManager
FileIDMapping.cpp / ContainerDirectory
=> in this phase the container directory is read from disk and memory structures for statistics are set up.
VirtualFileStatsProxy.cpp / PMRestart
=> this phase is single threaded! Try to reduce the LOB files. Here is room for some improvement (parallelism) @SAP
The reason for this long phase are 1,6TB of HybridLOBs.
AbsolutePageAccessImpl / RowStorePageAccess / CheckpointMgr.cc / Service_Startup
=> loading RS pages
RecoveryHandlerImp.cpp / Logger
PersistenceManagerImpl.cpp / PersistenceManager
=> here the global savepoint is written
IntegrityCheckerTimer.h / Service_Startup
=> RowStore consistency check which can be affected with parameter consistency_check_at_startup:
“Configure row store consistency check during SAP HANA startup: This parameter is used to configure a row store consistency check via CHECK_TABLE_CONSISTENCY during SAP HANA startup. It’s a list parameter and the allowed values are ‘table’, ‘page’ and ‘index’, which perform consistency checks on ‘table’, ‘page’ and ‘index’ respectively. This consistency check can be disabled by setting the parameter value to ‘none’.”
TRexApiSystem.cpp / TableReload
=> reload with 5 threads. With enough CPUs you can improve this phase with more parallel threads
=> lazy reload finished
Background info RS
Since HANA 1.0 SPS09 there is a hidden parameter keep_shared_memory_over_restart which keeps the Row Store in memory also after a DB restart. Therefor you will see a process called hdbrsutil which keeps the tables attached to the shared memory.
Background info CS
Since HANA 2.0 SPS4 there were some changes in the startup of RS and CS tables. The load of LOBs is optimized and there is a new feature which is called fast restart option. Since Rev. 35 there is also the possibility to use pmem if you have the correct hardware (cascade lake).
SPS3 vs SPS4
◈ Compared you can see exact same DBs
◈ No load (=no application server online) on both systems while testing
◈ The startup was tested 3 times per release
◈ The best and worst were not used
◈ no parameter improvements
◈ SPS4 without fast restart options
◈ result: round about 10min faster till available
◈ improvements in virtual file statistics with SPS4
◈ Reload of CS slower with SPS4
No comments:
Post a Comment