Sunday, 29 December 2019

Linux – Memory Management insights

Nowadays the Linux memory management of a SAP system (application server) or SAP HANA system getting more important since the clear roadmap of SAP (Linux as only OS for HANA) is showing that the amount of Linux installations is rising steeply.

One of the worst things which could happen to such a system in context of performance is swapping or paging. But is swapping and paging the same?

A lot of people mean paging when they are talking about swapping. Swapping is the older method of moving data from memory to disk. To swap a process means to move that entire process out of main memory and to the swap area on hard disk, whereby all pages of that process are moved at the same time.

With paging, when the kernel requires more main memory for an active process, only the least recently used pages of processes are moved to the swap space.

The most common Linux systems are mixed mode systems using paging and swapping.

A lot of customers are asking me in context of monitoring if the systems behavior is correct when the used memory is close to the physical memory size. As you may know there is a big difference in cause of the Unix memory concept and how an application handles its memory. The OS memory monitoring is totally useless if you want to use it for monitoring HANA systems.

The most famous tools are top (default), htop and nmon (contained in the most repositories).

With ps -ef or ps axu you can get a static view about the current processes.

But there are a lot of memory information like VSZ/VSS (virtual set size), RSS/RES (resident set size), SHR/SHM (shared memory).

Have you ever added up these values? In the most cases you will bust the physical memory size. But how can you determine the real usage of a process and may be the complete system?

We will start from top to bottom to get some insights.

1. Complete system memory

◉ Buffer
◉ Cache
◉ Shared memory
◉ Slab

2. Individual process memory usage
3. Collect support details


Testsystem is a 16GB SLES12 SP4 Application Server with 20 workprocesses.

Complete system memory


The most popular way to see the complete memory consumption is the command:

free -m

Example:

             total       used       free     shared    buffers     cached
Mem:         16318      15745        573       6548        174       8062
-/+ buffers/cache:       7508       8810
Swap:        12283       2422       9861

Pretty simple to explain:

Total = physical memory
Used = used memory (incl. buffers/caches)
Shared = Shared memory (details see shared memory section)
Free = not allocated memory
Swap = used swap space on disk

◉ 15,7GB of 16GB are allocated – only 573MB are free
◉ 7,5GB of 16GB are used by buffer and caches

The real free memory is 8810MB. They result from free (573MB), buffers (174MB) and cached (8062MB) => 573+174+8062 = ~8810 MB

Cache


With (Page)Cache and Buffers it is the same as paging and swapping. Most people mixing up these terms.

Pagecache is caching of file data. When a file is read from disk or network, the contents are stored in pagecache. No disk or network access is required, if the contents are up-to-date in pagecache.

Note: tmpfs and shared memory segments count toward pagecache!

Buffer


The buffercache is a type of pagecache for block devices (for example, /dev/sda). A file system typically uses the buffercache when accessing its on-disk metadata structures such as inode tables, allocation bitmaps, and so forth. Buffercache can be reclaimed similarly to pagecache.

These 2 terms were separated memory areas in Linux Kernel < 2.2. In newer Kernel 2.4+ they are building together the pagecache, because the buffer cache is writing its mapping of a block into a page. A more common description as holistic term you may know: filesystem cache (FS Cache)

Most of you may know that there is no need to panic if the used memory the system is using is close to the physical memory.

If you want to find out how much memory could be released if you clear the caches/buffer you can use this command:

sync; echo 3 > /proc/sys/vm/drop_caches

If some memory is not released that’s because of the shared memory. As already mentioned, shared memory segments count toward the pagecache which means that this memory is shared by current processes and can’t be released till all these processes which are using it are ended.

There are a lot of kernel parameter to control this automatically. Please do this only when the system is in trouble. Normally there is no need to do this besides for show and shine OS monitoring

For details you should check the meminfo:

cat /proc/meminfo

MemTotal:       16710208 kB
MemFree:          590720 kB
MemAvailable:    1662528 kB
Buffers:          178688 kB
Cached:          8069248 kB
SwapCached:        74560 kB
Active:         11600512 kB
Inactive:        1729024 kB
Active(anon):   10658816 kB
Inactive(anon):  1159424 kB
Active(file):     941696 kB
Inactive(file):   569600 kB
Unevictable:      150336 kB
Mlocked:          150336 kB
SwapTotal:      12578688 kB
SwapFree:       10097792 kB
Dirty:              1984 kB
Writeback:             0 kB
AnonPages:       5164224 kB
Mapped:          6361984 kB
Shmem:           6705536 kB
Slab:             380800 kB
SReclaimable:     112192 kB
SUnreclaim:       268608 kB
KernelStack:       11904 kB
PageTables:        42880 kB
[...]

Active, Active(anon), Active(file)

Recently used memory that will not be reclaimed unless necessary or on explicit request. Active is the sum of Active(anon) and Active(file):Active(anon) tracks swap-backed memory. This includes private and shared anony- mous mappings and private file pages after copy-on-write.
Active(file) tracks other file system backed memory.

Inactive, Inactive(anon), Inactive(file)

Less recently used memory that will usually be reclaimed first. Inactive is the sum of Inac- tive(anon) and Inactive(file):

Inactive(anon) tracks swap backed memory. This includes private and shared anonymous mappings and private file pages after copy-on-write.

Inactive(file) tracks other file system backed memory.

Shared Memory


Shared memory concept is heavily used by the SAP workprocesses. Each workprocess needs about 200-300MB exclusive memory footprint also when they are not active. Most of the memory is shared between the processes. The other part is the heap/working data itself.

You can check this for the complete system with:

grep -i shmem /proc/memory
Shmem:           6705536 kB

=> For an Application Server the shared memory value is always high – don’t worry about it is works-as-designed

For each shm segment:

ipcs -a
------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x00004dc4 125173760  sapadm     760        40141728   1
0x00004dbe 125206529  root       777        702916     1
0x000027bd 125370370  sidadm     740        60000000   1
0x00000000 229379     root       600        2610       0
0x00000000 262148     sidadm     740        1024       1
0x0382be85 294917     sidadm     640        4096       2
0x00002796 327686     sidadm     740        131072000  1
0x00000000 360455     sidadm     740        1024       1
0x0382be84 393224     sidadm     640        4096       21
0x00002749 425993     sidadm     740        2048592    20
0x0000271a 655370     sidadm     740        124000000  21
[…]


For cleaning up some zombie processes and shared memory segments SAP has introduced the binary cleanipc which is delivered by every SAP AS kernel. Normally this will be done with each clean shutdown process.

But you can trigger it by your own :

cleanipc <instance number> remove

Note: Do not use it when the system or any of its processes is still up and running. End all processes and check it with ps -fu <sidadm>

For each individual process:
cat /proc/<PID>/smaps

For SAP HANA systems the RowStore is also based on the shared memory concept. This is also the reason why those tables can’t be paged out. It is one the first data areas which are read into memory during startup and hold by processes hdbrsutil also when the HANA DB was stopped.

Slab

– Memory allocation for internal data structures of the kernel –

Normal this area should not consume more than 2GB. For the smaller systems < 128GB memory the slab memory consume is around 500MB.

You can check it with:

grep -i slab -A 4 /proc/meminfo
Slab:             380800 kB
SReclaimable:     112192 kB
SUnreclaim:       268608 kB
KernelStack:       11904 kB
PageTables:        42880 kB

cat /proc/slabinfo
labinfo - version: 2.1
# name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
SCTPv6                 2     42   1536   42    1 : tunables   24   12    8 : slabdata      1      1      0
SCTP                   0      0   1280   51    1 : tunables   24   12    8 : slabdata      0      0      0
nf_conntrack         211    762    256  254    1 : tunables  120   60    8 : slabdata      3      3      0
nfs_direct_cache       0      0    360  181    1 : tunables   54   27    8 : slabdata      0      0      0
nfs_inode_cache     7666  23373   1040   63    1 : tunables   24   12    8 : slabdata    371    371      0
rpc_inode_cache        0      0    640  102    1 : tunables   54   27    8 : slabdata      0      0      0
fscache_cookie_jar      4    799     80  799    1 : tunables  120   60    8 : slabdata      1      1      0
ext4_groupinfo_4k    360    448    144  448    1 : tunables  120   60    8 : slabdata      1      1      0
ext4_inode_cache    4136   5820   1080   60    1 : tunables   24   12    8 : slabdata     97     97      0
ext4_allocation_context      2    504    128  504    1 : tunables  120   60    8 : slabdata      1      1      0
[…]

Realtime monitoring command for kernel memory:

slabtop

SAP HANA Tutorial and Materials, SAP HANA Learning, SAP HANA Certifications, SAP HANA Online Exam

◉ Here we have 450MB active and nearly 700MB allocated

Individual process memory usage


I have to disappoint you, there is no easy way to determine the exact usage of a process without special tools or scripts. The resident memory (RSS) is a good indicator of the real usage but it does not include swapped out, inactive memory and shared memory. The closest value is PSS (Proportional Set Size). Never heard about this term? It is a new measurement concept.

VSZ / VSS :

The total amount of virtual memory used by the task. It includes all code, data and shared libraries plus pages that have been swapped out.

VIRT / VSZ / VSS = SWAP + RES

RSS / RES:

The non-swapped physical memory a task has used (incl. shared memory)

RES = CODE + DATA + SHM.

PSS:

Same as RSS but the shared memory will be tracked as a proportion used by the current process.

PSS = CODE + DATA + SHM / <processes using SHM>

Example:
25MB binary = > 50% loaded
200MB shared libraries (=shared memory) => 80% loaded
50MB heap => 75% used / loaded
10 Processes using the shared libs

VSZ: 25MB + 200MB + 50MB = 275MB
RSS: 25MB*0,5 + 200MB*0,8 + 50*0,75 = 210MB
PSS: 25MB*0,5 + 200MB*0,8/10 + 50*0,75 = 66MB

SAP HANA Tutorial and Materials, SAP HANA Learning, SAP HANA Certifications, SAP HANA Online Exam

But how could you determine such values? You can do this manually with /proc/<PID>/smaps but depending on the process and it allocation areas this could be a long analyses.

Quick and dirty analyses (edit PID value):

grep -e \- -e ^Size -e ^Rss -e ^Pss /proc/$(PID)/smaps

Personally, I’m using the first one because I’m more familiar with python as perl. With this scripts you can analyze one process (=PID).

So, I´ve written a small shell script to analyze more than just one PID. Normally you want to analyze the complete SAP system which includes all workprocesses.

Code for free usage:

#!/bin/sh
p_name=$1
ps -eo pid,comm,cmd |grep $p_name | grep -v grep> /tmp/pid_name.out
awk '{print $1}' /tmp/pid_name.out > /tmp/pid.out
PID_file='/tmp/pid.out'
i=0
BAR='####################'
end=$(cat $PID_file |wc -l)
while read pid;
do
cmd=$(grep $pid /tmp/pid_name.out)
echo "##############" >> $2
echo "SMAPS: $pid" >> $2
echo "Process: $cmd" >> $2
echo "##############" >> $2
let i+=1
python <path-to-python-script-edit-here>/smaps_analyzer.py /proc/$pid/smaps Pss >> $2
status=$(($i * 100 / $end))
end_status=$(($end / 20 * $i))
echo -ne "\r${BAR:0:$end_status}"
echo $status "%"
sleep 0.1
done < $PID_file
rm /tmp/pid.out /tmp/pid_name.out

just edit the path to the python script which you have already downloaded before.

Means:

1. download python script linux_smap_analyzer.py
2. create bash script with execute rights including the content above
3. edit path in bash script
4. run it with search term and output path

Usage:

./scriptname.sh <search_term> <output>

./pid_analyses.sh dw.sap /tmp/pid_analyses.out

“dw.sap” is a short term for the disp+work processes, but you can also use the PID or another term for which is used by your application.

Here is an example output of the workprocess 6 (W6) of a system called “SID” an instance number “00”:

##############
SMAPS: 560
Process:   560 SID_00_DIA_W6   dw.sapSID_D00 pf=/usr/sap/SID/SYS/profile/SID_D00_aldSIDd0
##############
all data
Pss             Rss             Size            name    other
268548 kB       2950720 kB      6434816 kB              /dev/zero (deleted)
247235 kB       247296 kB       309248 kB               [heap]
95080 kB        1100736 kB      1290240 kB              /SYSV00002716 (deleted)
38419 kB        323072 kB       1752000 kB              /SYSV00002738 (deleted)
17482 kB        17664 kB        83776 kB                unknown
11517 kB        217920 kB       255680 kB               /SYSV00002712 (deleted)
8910 kB         91200 kB        149440 kB               /SYSV00002718 (deleted)
8300 kB         84928 kB        84928 kB                /SYSV00002763 (deleted)
6476 kB         66176 kB        92288 kB                /SYSV00002746 (deleted)
4761 kB         56832 kB        88832 kB                /usr/sap/SID/D00/exe/disp+work
3472 kB         52480 kB        52544 kB                /SYSV00002739 (deleted)
2517 kB         44096 kB        44096 kB                /SYSV00002724 (deleted)
1478 kB         27200 kB        58816 kB                /SYSV00002759 (deleted)
960 kB          960 kB          1152 kB         [stack]
764 kB          16064           34304 kB                /SYSV00002725 (deleted)
669 kB          6464 kB         12864 kB                /usr/sap/SID/hdbclient/libSQLDBCHDB.so
617 kB          6976 kB         121152 kB               /SYSV0000271a (deleted)
526 kB          8192 kB         13120 kB                /SYSV00002743 (deleted)
428 kB          4992 kB         850048 kB               /SYSV00002713 (deleted)
379 kB          3648 kB         4928 kB         /usr/sap/SID/D00/exe/libsapcrypto.so
269 kB          1536 kB         2688 kB         /usr/sap/SID/D00/exe/dbhdbslib.so
191 kB          2304 kB         176704 kB               /SYSV0000274e (deleted)
164 kB          1216 kB         1728 kB         /usr/sap/SID/D00/exe/dw_xml.so
135 kB          256 kB          21056 kB                /usr/sap/SID/D00/exe/dw_gui.so
114 kB          896 kB          1920 kB         /usr/sap/SID/D00/exe/libicuuc.so.50
101 kB          640 kB          5312 kB         /usr/sap/SID/D00/exe/dw_abp.so
94 kB           512 kB          2496 kB         /usr/sap/SID/D00/exe/libicui18n.so.50
93 kB           1152 kB         2240 kB         /usr/lib64/libstdc++.so.6.0.25
90 kB           384 kB          832 kB          /usr/sap/SID/D00/exe/dw_rndrt.so
88 kB           448 kB          2944 kB         /usr/sap/SID/D00/exe/dw_xtc.so
86 kB           1600 kB         2048 kB         /SYSV00002749 (deleted)
85 kB           1728 kB         1856 kB         /lib64/libc-2.22.so
83 kB           384 kB          1152 kB         /usr/sap/SID/D00/exe/libregex.so
75 kB           1088 kB         1088 kB         /SYSV00002714 (deleted)
73 kB           1024 kB         4160 kB         /SYSV00002751 (deleted)
71 kB           832 kB          4160 kB         /SYSV00002750 (deleted)
70 kB           256 kB          256 kB          /lib64/libgcc_s.so.1
68 kB           256 kB          256 kB          /lib64/libpthread-2.22.so
68 kB           192 kB          192 kB          /lib64/libnss_sss.so.2
68 kB           320 kB          320 kB          /lib64/ld-2.22.so
67 kB           192 kB          192 kB          /lib64/libnss_files-2.22.so
42 kB           576 kB          1216 kB         /usr/sap/SID/D00/exe/dw_stl.so
67 kB           192 kB          192 kB          /lib64/libnss_files-2.22.so
42 kB           576 kB          1216 kB         /usr/sap/SID/D00/exe/dw_stl.so
40 kB           640 kB          6144 kB         /SYSV00002722 (deleted)
30 kB           384 kB          20480 kB                /usr/sap/SID/D00/exe/libicudata.so.50
27 kB           576 kB          896 kB          /lib64/libm-2.22.so
9 kB            128 kB          192 kB          /SYSV00002744 (deleted)
8 kB            256 kB          256 kB          /lib64/libresolv-2.22.so
7 kB            128 kB          15680 kB                /usr/sap/SID/D00/exe/librender.so
7 kB            128 kB          1920 kB         /usr/sap/SID/D00/exe/libicuuc51.so
7 kB            128 kB          192 kB          /usr/sap/SID/D00/exe/libiculx51.so
7 kB            128 kB          448 kB          /usr/sap/SID/D00/exe/libicule51.so
7 kB            192 kB          192 kB          /lib64/libnss_dns-2.22.so
6 kB            192 kB          192 kB          /lib64/libdl-2.22.so
6 kB            64 kB           576 kB          /SYSV00002748 (deleted)
5 kB            128 kB          192 kB          /usr/lib64/libuuid.so.1.3.0
4 kB            128 kB          192 kB          /lib64/librt-2.22.so
4 kB            64 kB           64 kB           /SYSV00002761 (deleted)
4 kB            64 kB           64 kB           /SYSV00002717 (deleted)
3 kB            64 kB           64 kB           /usr/sap/SID/SIDadm/.hdb/sap-ald-138-s/SQLDBC.shm
3 kB            64 kB           21952 kB                /usr/sap/SID/D00/exe/libicudata51.so
3 kB            64 kB           64 kB           /SYSV0382be84 (deleted)
3 kB            64 kB           64 kB           /SYSV0000272e (deleted)
3 kB            64 kB           64 kB           /SYSV00002711 (deleted)
1 kB            128 kB          128 kB          [vdso]
stack maps
Pss             Rss             Size            name    other
960 kB          960 kB          1152 kB         [stack]
all so maps
Pss             Rss             Size            name    other
669 kB          6464 kB         12864 kB                /usr/sap/SID/hdbclient/libSQLDBCHDB.so
379 kB          3648 kB         4928 kB         /usr/sap/SID/D00/exe/libsapcrypto.so
269 kB          1536 kB         2688 kB         /usr/sap/SID/D00/exe/dbhdbslib.so
164 kB          1216 kB         1728 kB         /usr/sap/SID/D00/exe/dw_xml.so
135 kB          256 kB          21056 kB                /usr/sap/SID/D00/exe/dw_gui.so
101 kB          640 kB          5312 kB         /usr/sap/SID/D00/exe/dw_abp.so
90 kB           384 kB          832 kB          /usr/sap/SID/D00/exe/dw_rndrt.so
88 kB           448 kB          2944 kB         /usr/sap/SID/D00/exe/dw_xtc.so
85 kB           1728 kB         1856 kB         /lib64/libc-2.22.so
83 kB           384 kB          1152 kB         /usr/sap/SID/D00/exe/libregex.so
68 kB           256 kB          256 kB          /lib64/libpthread-2.22.so
68 kB           320 kB          320 kB          /lib64/ld-2.22.so
67 kB           192 kB          192 kB          /lib64/libnss_files-2.22.so
42 kB           576 kB          1216 kB         /usr/sap/SID/D00/exe/dw_stl.so
27 kB           576 kB          896 kB          /lib64/libm-2.22.so
8 kB            256 kB          256 kB          /lib64/libresolv-2.22.so
7 kB            128 kB          15680 kB                /usr/sap/SID/D00/exe/librender.so
7 kB            128 kB          1920 kB         /usr/sap/SID/D00/exe/libicuuc51.so
7 kB            128 kB          192 kB          /usr/sap/SID/D00/exe/libiculx51.so
7 kB            128 kB          1920 kB         /usr/sap/SID/D00/exe/libicuuc51.so
7 kB            128 kB          192 kB          /usr/sap/SID/D00/exe/libiculx51.so
7 kB            128 kB          448 kB          /usr/sap/SID/D00/exe/libicule51.so
7 kB            192 kB          192 kB          /lib64/libnss_dns-2.22.so
6 kB            192 kB          192 kB          /lib64/libdl-2.22.so
4 kB            128 kB          192 kB          /lib64/librt-2.22.so
3 kB            64 kB           21952 kB                /usr/sap/SID/D00/exe/libicudata51.so
all dex maps
Pss             Rss             Size            name    other
app so maps
Pss             Rss             Size            name    other
app lib so maps
Pss             Rss             Size            name    other
app dex maps
Pss             Rss             Size            name    other
avlive txav maps
Pss             Rss             Size            name    other
tbs maps
Pss             Rss             Size            name    other

map Pss total = 720927 Kb
map Vss total = 12039104 Kb
stacks Pss = 960 kB
stacks Vss = 1152 kB
all so map Pss = 2391 kB
all so map Vss = 99264 kB
all dex map Pss = 0 kB
all dex map Vss = 0 kB
app so map Rss
app so map Rss = 0 kB
app so map Vss = 0 kB
app dex map Rss
app dex map Rss = 0 kB
app  map Vss = 0 kB
app_tbs
tbs mem map Pss = 0 kB
avlive txav
tbs mem map Pss = 0 kB

To summarize this process details:

map Pss total = 720927 Kb => 720MB
map Vss total = 12039104 Kb => 12000MB
=> which means about 11,3GB are SHM (mostly the case) or swap

To analyze a bunch of processes in one output file:

grep "map Pss total" /tmp/pid_analyses.out | sort -gk 5
map Pss total = 1350 Kb
map Pss total = 20986 Kb
map Pss total = 77454 Kb
map Pss total = 140503 Kb
map Pss total = 220375 Kb
map Pss total = 247087 Kb
map Pss total = 267906 Kb
map Pss total = 307026 Kb
map Pss total = 320250 Kb
map Pss total = 369559 Kb
map Pss total = 720927 Kb
map Pss total = 737010 Kb
map Pss total = 749670 Kb
map Pss total = 752209 Kb
map Pss total = 763658 Kb
map Pss total = 783921 Kb
map Pss total = 794335 Kb
map Pss total = 794902 Kb
map Pss total = 895677 Kb
map Pss total = 912923 Kb

This means the system with 20 workprocesses needs currently about 9,8GB memory.

Collect support details


For creating all relevant information for the SAP / Linux support please run the script sapsysinfo.sh which is attached to note 617104.

Additionally, you can create details for SLES with the commands supportconfig and for RHEL sosreport.

Note: Please run these collection tools immediately in all other cases it is too late to reconstruct the scenario and what is the real root cause of your memory consumption!

No comments:

Post a Comment