1.What are savepoints?
◉ Savepoints are required to synchronize changes in memory with the persistency on disk level. All modified pages of row and column store are written to disk during a savepoint.
◉ Each SAP HANA host and service has its own savepoints.
◉ The data belonging to a savepoint represents a consistent state of the data on disk and remains untouched until the next savepoint operation has been completed.
2.When is a savepoint triggered?
Savepoint interval(automatic)
During normal operations savepoints are automatically triggered when a predefined time since the last savepoint is passed. The length of the time interval between two consecutive savepoints can be controlled with the following parameter:
global.ini -> [persistence] -> savepoint_interval_s
Its default value is 300, so savepoints are taken in intervals of 300 seconds (5 minutes).
System command (manual)
The following command can be used to execute a savepoint manually:
ALTER SYSTEM SAVEPOINT
Soft shutdown
A soft shutdown invokes a savepoint before the services are stopped.
A hard shutdown doesn’t trigger a savepoint. This can increase the subsequent restart time.
Backup
A global savepoint is performed before a data backup is started.
A savepoint is written after the backup of a specific service if finished.
Startup
After a consistent database state is reached during startup, a savepoint is performed.
Snapshots
Snapshots are savepoints that are preserved for longer use and so they are not overwritten by the next savepoint.
3. Helpful Views
View | Details |
M_SAVEPOINT_STATISTICS | Global savepoint information per host and service |
M_SAVEPOINTS | Detailed information for individual savepoints |
M_SERVICE_THREADS M_SERVICE_THREAD_SAMPLES HOST_SERVICE_THREAD_SAMPLES |
As of SAP HANA SPS 10 savepoint details are logged for THREAD_TYPE = ‘PeriodicSavepoint’ |
4. Helpful SQL Script.
1969700 – SQL statement collection for SAP HANA
SQL statement | Details |
SQL: “HANA_IO_Savepoints“ | Detailed information for individual savepoints |
SQL: “HANA_IO_Snapshots” | Snapshot information |
5. Blocking Phase
The majority of the savepoint is performed online without holding a lock, but the finalization of the savepoint requires a lock. This step is called the blocking phase of the savepoint. It consists of two major subphases:
Sub phase | Thread detail | Description |
WaitForLock | enterCriticalPhase(waitForLock) | Before the critical phase is entered, a ConsistentChangeLock needs to be allocated by the savepoint. If this lock is held by other threads / transactions, the dur style="width: 100%;ation of this phase is increasing. At the same time all other modifications on the underlying table like INSERT, UPDATE or DELETE are blocked by the savepoint with ConsistentChangeLock. |
Critical | processCriticalPhase | Once the ConsistentChangeLock is acquired, the actual critical phase is entered and remaining I/O writes are performed in order to guarantee a consistent set of data on disk level. During this time other transactions aren’t allowed to perform changes on the underlying table and are blocked with ConsistentChangeLock. |
6. Typical savepoint issues analysis
Symptoms | Thread detail | Description |
Long waitForLock phase | enterCriticalPhase (waitForLock) | Long durations of the blocking phase (outside of the critical phase) are typically caused by SAP HANA internal lock contention. The following known scenarios exist ConsistentChangeLock Starting with Rev. 102 you can configure the following parameter in order to trigger a runtime dump (SAP Note 2400007) in case waiting for entering the critical phase takes longer than <seconds> seconds:indexserver.ini -> [persistence] -> runtimedump_for_blocked_savepoint_timeout = ‘<seconds>’ |
Long critical phase | processCriticalPhase | Delays during the critical phase are often caused by problems in the disk I/O area. |
7. Analyze the runtime dump
indexserver_<hostname>.30003.rtedump.<timestamp>.savepoint_blocked.trc
is triggerred by the parameter runtimedump_for_blocked_savepoint_timeout.
You could check the runtime dump from the following aspects.
We could find the savepoint thread, Savepoint Callstack contains “DataAccess::SavepointLock::lockExclusive”
Example: In the following section, you could find that the thread id 298995 is blocking the shared lock which leads to the exclusive lock is blocked and hangs the savepoint.
We could find the savepoint thread, Savepoint Callstack contains “DataAccess::SavepointLock::lockExclusive”
Other threads(SQL thread) waiting for the lock, Callstack contains: “DataAccess::SavepointSPI::lockSavepoint”
Runtime dump : section [SAVEPOINT_SHAREDLOCK_OWNERS]
Always, most time the savepoint hangs because the exclusive lock is occupied by other thread. This section can helps find which thread is occupying the lock.
SAVEPOINT_SHAREDLOCK_OWNERS | Owners of shared ConsistentChangeLock locks | In case a savepoint is blocked in the waitForLock phase (SAP Note 2100009), the blocking activities can be found in this section. |
Example: In the following section, you could find that the thread id 298995 is blocking the shared lock which leads to the exclusive lock is blocked and hangs the savepoint.
[SAVEPOINT_SHAREDLOCK_OWNERS] Owners of shared SavepointLocks: (2017-10-10 11:18:13 112 Local)
96034[thr=298995]: JobWrk0145, TID: 4856, UTID: 1588661641, CID: -1, LCID: 0, parent: 299143, SQLUserName: “”, AppUserName: “”, AppName: “”, ConnCtx: —, StmtCtx: —, type: “JobWorker”, method: “”, detail: “”, command: “” at 0x00007efe63342e88 in ltt::string_base<char, ltt::char_traits<char> >::trim_(unsigned long)+0xb8 at string.hpp:683 (libhdbcs.so)
[OK]
After you got the thread id of the sharedlock owner, you could search the thread id and try to find its parent thread id. In this example, we could find the parent thread id is the following:
107423[thr=299143]: MergedogMerger, TID: 4856, UTID: 1588661641, CID: -1, LCID: 0, parent: 299445, SQLUserName: “”, AppUserName: “”, AppName: “”, ConnCtx: —, StmtCtx: —, type: “MergedogMerger“, method: “”, detail: “3 of 3 table(s): SAPERP:/1LT/VF00094506“, command: “” at 0x00007efe4e645f59 in syscall+0x19 (libc.so.6)
We got the conclusion that the merge of the table /1LT/VF00094506 is blocking the shared lock. Then we could try to find if any issue with the merge of the table.
Runtime dump: Section : [STATISTICS] M_SAVEPOINTS_
Import the data of this view to excel, and sort by column “CRITICAL_PHASE_WAIT_TIME” and “CRITICAL_PHASE_DURATION”
And we could see that the CRITICAL_PHASE_WAIT_TIME is over 10s, which is quite slow. This proves that there is issue with the savepoint and also issue with the exclusive lock.
And if you could find long duration of “CRITICAL_PHASE_DURATION”. This means there is issue with the I/O.
No comments:
Post a Comment