Note that there are some explanatory texts on larger screens.

plurals
  1. POInterpreting tcmalloc's MALLOCSTATS output
    primarykey
    data
    text
    <p>I am trying to fix performance problem with a multi threaded application which uses tcmalloc. Each threads creates large number of objects and my analysis is that thread caches in tcmalloc are not able to allocate memory and often tries to fetch memory from central page heap. This is my output of of app with <strong>MALLOCSTATS=2</strong> for 4 threads.</p> <blockquote> <pre><code>Total size of freelists for per-thread caches, transfer cache, and central cache, by size class ------------------------------------------------ class 1 [ 8 bytes ] : 2046 objs; 0.0 MiB; 0.0 cum MiB class 2 [ 16 bytes ] : 1023 objs; 0.0 MiB; 0.0 cum MiB class 3 [ 32 bytes ] : 507 objs; 0.0 MiB; 0.0 cum MiB class 5 [ 64 bytes ] : 511 objs; 0.0 MiB; 0.1 cum MiB class 6 [ 80 bytes ] : 204 objs; 0.0 MiB; 0.1 cum MiB class 9 [ 128 bytes ] : 128 objs; 0.0 MiB; 0.1 cum MiB class 15 [ 224 bytes ] : 73 objs; 0.0 MiB; 0.1 cum MiB class 16 [ 240 bytes ] : 68 objs; 0.0 MiB; 0.1 cum MiB class 17 [ 256 bytes ] : 64 objs; 0.0 MiB; 0.2 cum MiB class 19 [ 320 bytes ] : 47 objs; 0.0 MiB; 0.2 cum MiB class 25 [ 512 bytes ] : 352 objs; 0.2 MiB; 0.3 cum MiB class 26 [ 576 bytes ] : 28 objs; 0.0 MiB; 0.4 cum MiB class 33 [ 1024 bytes ] : 1072 objs; 1.0 MiB; 1.4 cum MiB class 39 [ 2048 bytes ] : 832 objs; 1.6 MiB; 3.0 cum MiB class 45 [ 4096 bytes ] : 276 objs; 1.1 MiB; 4.1 cum MiB class 50 [ 8192 bytes ] : 2 objs; 0.0 MiB; 4.1 cum MiB ------------------------------------------------ PageHeap: 16 sizes; 713.5 MiB free; 0.0 MiB unmapped ------------------------------------------------ 2 pages * 39 spans ~ 0.6 MiB; 0.6 MiB cum; unmapped: 0.0 MiB; 0.0 MiB cum 4 pages * 19 spans ~ 0.6 MiB; 1.2 MiB cum; unmapped: 0.0 MiB; 0.0 MiB cum 6 pages * 17 spans ~ 0.8 MiB; 2.0 MiB cum; unmapped: 0.0 MiB; 0.0 MiB cum 8 pages * 6 spans ~ 0.4 MiB; 2.4 MiB cum; unmapped: 0.0 MiB; 0.0 MiB cum 10 pages * 4 spans ~ 0.3 MiB; 2.7 MiB cum; unmapped: 0.0 MiB; 0.0 MiB cum 12 pages * 2 spans ~ 0.2 MiB; 2.9 MiB cum; unmapped: 0.0 MiB; 0.0 MiB cum 14 pages * 2 spans ~ 0.2 MiB; 3.1 MiB cum; unmapped: 0.0 MiB; 0.0 MiB cum 16 pages * 2 spans ~ 0.2 MiB; 3.3 MiB cum; unmapped: 0.0 MiB; 0.0 MiB cum 20 pages * 1 spans ~ 0.2 MiB; 3.5 MiB cum; unmapped: 0.0 MiB; 0.0 MiB cum 28 pages * 1 spans ~ 0.2 MiB; 3.7 MiB cum; unmapped: 0.0 MiB; 0.0 MiB cum 30 pages * 2 spans ~ 0.5 MiB; 4.2 MiB cum; unmapped: 0.0 MiB; 0.0 MiB cum 34 pages * 1 spans ~ 0.3 MiB; 4.5 MiB cum; unmapped: 0.0 MiB; 0.0 MiB cum 44 pages * 2 spans ~ 0.7 MiB; 5.1 MiB cum; unmapped: 0.0 MiB; 0.0 MiB cum 76 pages * 1 spans ~ 0.6 MiB; 5.7 MiB cum; unmapped: 0.0 MiB; 0.0 MiB cum 78 pages * 1 spans ~ 0.6 MiB; 6.3 MiB cum; unmapped: 0.0 MiB; 0.0 MiB cum 108 pages * 1 spans ~ 0.8 MiB; 7.2 MiB cum; unmapped: 0.0 MiB; 0.0 MiB cum </code></pre> <p>255 large * 15 spans ~ 706.3 MiB; 713.5 MiB cum; unmapped: 0.0 MiB; 0.0 MiB cum </p> </blockquote> <p>Now I don't really understand whether this shows which thread caches are getting exhausted or not. My analysis of thread caches getting exhausted is based on observing the program running under GDB and interpreting at tcmalloc code which calls futex system call.</p> <p><strong>UPDATE</strong> I also noticed that per-thread caches are not changing when number of threads are being increased/decreased. It the page heap which grows. </p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload