Thursday, 26 October 2017

HANA Savepoint Analysis

1.What are savepoints?


◉ Savepoints are required to synchronize changes in memory with the persistency on disk level. All modified pages of row and column store are written to disk during a savepoint.
◉ Each SAP HANA host and service has its own savepoints.
◉ The data belonging to a savepoint represents a consistent state of the data on disk and remains untouched until the next savepoint operation has been completed.

2.When is a savepoint triggered?



Savepoint interval(automatic)

During normal operations savepoints are automatically triggered when a predefined time since the last savepoint is passed. The length of the time interval between two consecutive savepoints can be controlled with the following parameter:

global.ini -> [persistence] -> savepoint_interval_s

Its default value is 300, so savepoints are taken in intervals of 300 seconds (5 minutes).

System command (manual)

The following command can be used to execute a savepoint manually:
ALTER SYSTEM SAVEPOINT

Soft shutdown

A soft shutdown invokes a savepoint before the services are stopped.
A hard shutdown doesn’t trigger a savepoint. This can increase the subsequent restart time.

Backup

A global savepoint is performed before a data backup is started.
A savepoint is written after the backup of a specific service if finished.

Startup

After a consistent database state is reached during startup, a savepoint is performed.

Snapshots

Snapshots are savepoints that are preserved for longer use and so they are not overwritten by the next savepoint.


3. Helpful Views


View Details
M_SAVEPOINT_STATISTICS Global savepoint information per host and service
M_SAVEPOINTS Detailed information for individual savepoints
M_SERVICE_THREADS
M_SERVICE_THREAD_SAMPLES
HOST_SERVICE_THREAD_SAMPLES
As of SAP HANA SPS 10 savepoint details are logged for THREAD_TYPE = ‘PeriodicSavepoint’

4. Helpful SQL Script.


1969700 – SQL statement collection for SAP HANA

SQL statementDetails
SQL: “HANA_IO_Savepoints“Detailed information for individual savepoints
SQL: “HANA_IO_Snapshots”Snapshot information


5. Blocking Phase 


The majority of the savepoint is performed online without holding a lock, but the finalization of the savepoint requires a lock. This step is called the blocking phase of the savepoint. It consists of two major subphases:

Sub phase Thread detail Description
WaitForLock enterCriticalPhase(waitForLock) Before the critical phase is entered, a ConsistentChangeLock needs to be allocated by the savepoint. If this lock is held by other threads / transactions, the dur style="width: 100%;ation of this phase is increasing. At the same time all other modifications on the underlying table like INSERT, UPDATE or DELETE are blocked by the savepoint with ConsistentChangeLock.
Critical processCriticalPhase Once the ConsistentChangeLock is acquired, the actual critical phase is entered and remaining I/O writes are performed in order to guarantee a consistent set of data on disk level. During this time other transactions aren’t allowed to perform changes on the underlying table and are blocked with ConsistentChangeLock.


6. Typical savepoint issues analysis


SymptomsThread detailDescription
Long waitForLock phaseenterCriticalPhase
(waitForLock)
Long durations of the blocking phase (outside of the critical phase) are typically caused by SAP HANA internal lock contention. The following known scenarios exist
ConsistentChangeLock
Starting with Rev. 102 you can configure the following parameter in order to trigger a runtime dump (SAP Note 2400007) in case waiting for entering the critical phase takes longer than <seconds> seconds:indexserver.ini -> [persistence] -> runtimedump_for_blocked_savepoint_timeout = ‘<seconds>’
Long critical phaseprocessCriticalPhaseDelays during the critical phase are often caused by problems in the disk I/O area.


7. Analyze the runtime dump


indexserver_<hostname>.30003.rtedump.<timestamp>.savepoint_blocked.trc
is triggerred by the parameter runtimedump_for_blocked_savepoint_timeout.
You could check the runtime dump from the following aspects.

We could find the savepoint thread, Savepoint Callstack contains “DataAccess::SavepointLock::lockExclusive”

HANA Savepoint Analysis, SAP HANA Certifications, SAP HANA Materials and Tutorials

Other threads(SQL thread) waiting for the lock, Callstack contains: “DataAccess::SavepointSPI::lockSavepoint”

HANA Savepoint Analysis, SAP HANA Certifications, SAP HANA Materials and Tutorials
Runtime dump : section [SAVEPOINT_SHAREDLOCK_OWNERS]

Always, most time the savepoint hangs because the exclusive lock is occupied by other thread. This section can helps find which thread is occupying the lock.

SAVEPOINT_SHAREDLOCK_OWNERS Owners of shared ConsistentChangeLock locks In case a savepoint is blocked in the waitForLock phase (SAP Note 2100009), the blocking activities can be found in this section. 

Example:  In the following section, you could find that the thread id 298995 is blocking the shared lock which leads to the exclusive lock is blocked and hangs the savepoint. 

[SAVEPOINT_SHAREDLOCK_OWNERS] Owners of shared SavepointLocks: (2017-10-10 11:18:13 112 Local)
96034[thr=298995]: JobWrk0145, TID: 4856, UTID: 1588661641, CID: -1, LCID: 0, parent: 299143, SQLUserName: “”, AppUserName: “”, AppName: “”, ConnCtx: —, StmtCtx: —, type: “JobWorker”, method: “”, detail: “”, command: “” at 0x00007efe63342e88 in ltt::string_base<char, ltt::char_traits<char> >::trim_(unsigned long)+0xb8 at string.hpp:683 (libhdbcs.so)
[OK]

After you got the thread id of the sharedlock owner, you could search the thread id and try to find its parent thread id. In this example, we could find the parent thread id is the following:

107423[thr=299143]: MergedogMerger, TID: 4856, UTID: 1588661641, CID: -1, LCID: 0, parent: 299445, SQLUserName: “”, AppUserName: “”, AppName: “”, ConnCtx: —, StmtCtx: —, type: “MergedogMerger“, method: “”, detail: “3 of 3 table(s): SAPERP:/1LT/VF00094506“, command: “” at 0x00007efe4e645f59 in syscall+0x19 (libc.so.6)

We got the conclusion that the merge of the table /1LT/VF00094506 is blocking the shared lock. Then we could try to find if any issue with the merge of the table.

Runtime dump: Section :  [STATISTICS]  M_SAVEPOINTS_

Import the data of this view to excel, and sort by column “CRITICAL_PHASE_WAIT_TIME” and “CRITICAL_PHASE_DURATION”

HANA Savepoint Analysis, SAP HANA Certifications, SAP HANA Materials and Tutorials

HANA Savepoint Analysis, SAP HANA Certifications, SAP HANA Materials and Tutorials

And we could see that the CRITICAL_PHASE_WAIT_TIME is over 10s, which is quite slow. This proves that there is issue with the savepoint and also issue with the exclusive lock.

And if you could find long duration of “CRITICAL_PHASE_DURATION”. This means there is issue with the I/O.

No comments:

Post a Comment