StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POMemory leaks hunting without valgrind
text
Body
copied!<p>I have set of programs working together with shared memory (ipc) ~ 48GB.</p> <p>Programs running in Linux 3.6.0-rc5, written plain C, compiled gcc load average on main computer is 6.0 jumping to 16.0 every 10 seconds (24 cores)</p> <p>One proxy receiving data from other machines by 0mq (3.2.3, ~1000 msgs/s from 12 machines in same network), writing into shared memory Many (<50) workers read this data and do some calculations.</p> <p>Proxy using around 20% cpu Every worker using 1% CPU jumping 10% periodically.</p> <p>All programs written such way when all allocations done in init() - called when program start, all free done in destroy() - called before exit</p> <p>Repetitive code not using any malloc/calloc/free at all.</p> <p>But both programs still leaks. Around 120-240 bytes per minute. This isnt much - memory exhausted in 7-8 days and i just start/stop process, but those leaked bytes eating my mind every time monitoring app reporting me about this restart :)</p> <p>Bad things - i cant run valgrind due using shared memory - its just stopping on allocating/attaching shared memory and then everything start crushed.</p> <p>Trying to find this leak i've made stripped version of proxy - no leaks, but i cant feed it with same amount of data.</p> <p>When running under gdb still no leaks, but speed dropped around 2/3 - so may be its not as fast to reproduce this error.</p> <p>So possible leaks are in:</p> <ul> <li>my code. but there is no malloc/calloc. Just pointers +-, memcpy, memcmp</li> <li>some standard library. glibc? syslog?</li> <li>0mq on working with many sources (don't think 1k/msgs per seconds is too much traffic) </li> </ul> <p>Is any other tools/libs/hacks exists that can help in such situation?</p> <p>Edit: Shivan Raptor asked about code. Repetitive part is 5k lines of maths. Without any allocations as i mentioned.</p> <p>But start, stop and repetitive entering here:</p> <pre><code>int main(int argc, char **argv) { ida_init(argc, argv, PROXY); ex_pollponies(); // repetive ida_destroy(); return(0); } // with some cuttings int ex_pollponies(void) { int i, rc; unsigned char buf[90]; uint64_t fos[ROLLINGBUFFERSIZE]; uint64_t bhs[ROLLINGBUFFERSIZE]; int bfcnt = 0; uint64_t *fo; uint64_t *bh; while(1) { rc = zmq_poll(ex_in->poll_items, ex_in->count, EX_POLL_TIMEOUT); for (i=0; i < ex_in->count; i++) { if (ex_in->poll_items[i].revents & ZMQ_POLLIN) { if (zmq_recv(ex_in->poll_items[i].socket, &buf, max_size, 0) == 0) continue; fo = &fos[bfcnt]; bh = &bhs[bfcnt]; bfcnt++; if (bfcnt >= ROLLINGBUFFERSIZE) bfcnt = 0; memcpy(fo, (void *)&buf[1], sizeof(FRAMEOBJECT)); memcpy(bh, &buf[sizeof(FRAMEOBJECT)+1], sizeof(FRAMEHASH)); // then store fo, bh into shared memory, with some adjusting and checkings // storing every second around 1000 msgs 16 bytes each. But leaking is only 200 bytes per minute. } } } } </code></pre> <p>edit2:</p> <p>I finally make valgrind working - just make use part of data (6GB) and it finally passed. And not find any leaks. But, in process of working it takes 100% cpu and definitely my program not handled all incoming data - its not working on full load. This half confirmed my lasthope guess - leaking is on data exchange block. I find info about mtrace (part of libc) It helped me to track ADDRESS of leaking - its outside my code, in one of threads. The only threads in my code is created by zeromq. Then i start playing with options for sockets (increase hwm, buffers) and speed of leaking decreased, but not completely gone even on absurdly big values :(</p> <p>So, now i 95% sure its zeromq leaking. Try to find answer in their mail list.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload