Note that there are some explanatory texts on larger screens.

plurals
  1. POTwisted getPage(): process memory grow when requesting lot of pages
    primarykey
    data
    text
    <p>I am writing a script for contstant (each 30-120 sec) grabbing of information quering a large set of URLs (Icecast/Shoutcast servers status pages), about 500 urls. It works fine, but the python process resident size is constantly keep growing. I am sure it is endless grow as I left it running for several hours and it took 1.2Gb RES from initial 30Mb.</p> <p>I simplified the script for easy understanding to the following:</p> <pre><code>from twisted.internet import reactor from twisted.web.client import getPage from twisted.enterprise import adbapi def ok(res, url): print "OK: " + str(url) reactor.callLater(30, load, url) def error(res, url): print "FAIL: " + str(url) reactor.callLater(30, load, url) def db_ok(res): for item in res: if item[1]: print "ADDED: " + str(item[1]) reactor.callLater(30, load, item[1]) def db_error(res): print "Database error: " + str(res) reactor.stop() def load(url): d = getPage(url, headers={"Accept": "text/html"}, timeout=30) d.addCallback(ok, url) d.addErrback(error, url) dbpool = adbapi.ConnectionPool("MySQLdb", "host", "user", "passwd", db="db") q = dbpool.runQuery("SELECT id, url FROM stations") q.addCallback(db_ok).addErrback(db_error) reactor.run() </code></pre> <p>It grows the same as the original daemon and thus I localized the problem. I think it is related to twisted.web.client.getPage() somehow. In original daemon I have used twisted.manhole for heap evalutaions with meliae during run time, but do not see anything nasty.</p> <p>First meliae dump made in right after only 1 or 2 query cycles are finished:</p> <pre><code>Total 84313 objects, 188 types, Total size = 15.9MiB (16647235 bytes) Index Count % Size % Cum Max Kind 0 5806 6 4142800 24 24 786712 dict 1 28070 33 2223457 13 38 4874 str 2 612 0 1636992 9 48 3424 HTTPClientFactory 3 19599 23 1585720 9 57 608 tuple 4 643 0 720160 4 61 1120 DelayedCall 5 642 0 713904 4 66 1112 Client 6 617 0 691040 4 70 1120 Connector 7 639 0 577656 3 73 904 type 8 691 0 556576 3 77 1120 Deferred 9 3962 4 475440 2 80 120 function 10 3857 4 462840 2 82 120 code 11 3017 3 308192 1 84 4856 list 12 240 0 266880 1 86 1112 Method 13 2968 3 237440 1 87 80 instancemethod 14 612 0 215424 1 88 352 InsensitiveDict 15 217 0 211128 1 90 12624 module 16 2185 2 157320 0 91 72 builtin_function_or_method 17 107 0 119840 0 91 1120 HTTPPageGetter 18 343 0 117992 0 92 344 IcecastRadioStation 19 343 0 117992 0 93 344 HTTPExtractor </code></pre> <p>And top around that time:</p> <pre><code>VIRT RES SHR S %CPU %MEM TIME+ COMMAND 248m 27m 4152 R 92 1.6 0:09.21 python </code></pre> <p>Now we wait for a time and check again, this is the picture after 20 minutes running (around 40 query cycles):</p> <pre><code>Total 67428 objects, 188 types, Total size = 11.9MiB (12463799 bytes) Index Count % Size % Cum Max Kind 0 3865 5 3601624 28 28 786712 dict 1 23762 35 2002029 16 44 4874 str 2 16382 24 1346208 10 55 608 tuple 3 644 0 582176 4 60 904 type 4 174 0 554304 4 64 3424 HTTPClientFactory 5 456 0 510720 4 68 1120 DelayedCall 6 3963 5 475560 3 72 120 function 7 3857 5 462840 3 76 120 code 8 240 0 266880 2 78 1112 Method 9 237 0 263544 2 80 1112 Client 10 217 0 211128 1 82 12624 module 11 187 0 209440 1 84 1120 Connector 12 182 0 194624 1 85 1120 Deferred 13 1648 2 179696 1 87 3768 list 14 1530 2 122400 0 88 80 instancemethod 15 343 0 117992 0 89 344 IcecastRadioStation 16 343 0 117992 0 90 344 HTTPExtractor 17 1175 1 103400 0 90 88 weakref 18 1109 1 88720 0 91 80 wrapper_descriptor 19 75 0 83400 0 92 1112 InterfaceClass </code></pre> <p>And top:</p> <pre><code>VIRT RES SHR S %CPU %MEM TIME+ COMMAND 532m 240m 4152 S 54 13.7 4:02.64 python </code></pre> <p>According to meliae neither objects count is growing nor total size. But the process eat up 200Mb of resident memory during this 20 minutes.</p> <p>I also used valgrind on python, but no leaks are found. Any thoughts? </p> <p>I am using Python version 2.6.6, twisted version 10.2.0</p> <p><hr> Update #1: I also used valgrind massif to profile CPython memory usage, here is allocation tree for 99.93% of memory allocated:</p> <pre><code>99.93% (578,647,287B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc. -&gt;94.69% (548,309,283B) 0x550819: O_cwrite (cStringIO.c:406) | -&gt;94.69% (548,309,283B) 0x55096F: O_write (cStringIO.c:436) | -&gt;94.69% (548,309,283B) 0x5A17F9: PyCFunction_Call (methodobject.c:81) | -&gt;94.69% (548,309,283B) 0x4D1373: call_function (ceval.c:3750) | -&gt;94.69% (548,309,283B) 0x4CC2A2: PyEval_EvalFrameEx (ceval.c:2412) | -&gt;94.69% (548,309,283B) 0x4D1868: fast_function (ceval.c:3836) | -&gt;94.69% (548,309,283B) 0x4D1549: call_function (ceval.c:3771) | -&gt;94.69% (548,309,283B) 0x4CC2A2: PyEval_EvalFrameEx (ceval.c:2412) | -&gt;94.69% (548,309,283B) 0x4D1868: fast_function (ceval.c:3836) | -&gt;94.69% (548,309,283B) 0x4D1549: call_function (ceval.c:3771) | -&gt;94.69% (548,309,283B) 0x4CC2A2: PyEval_EvalFrameEx (ceval.c:2412) | -&gt;94.69% (548,309,283B) 0x4D1868: fast_function (ceval.c:3836) | -&gt;94.69% (548,309,283B) 0x4D1549: call_function (ceval.c:3771) | -&gt;94.69% (548,309,283B) 0x4CC2A2: PyEval_EvalFrameEx (ceval.c:2412) | -&gt;94.69% (548,309,283B) 0x4D1868: fast_function (ceval.c:3836) | -&gt;94.69% (548,309,283B) 0x4D1549: call_function (ceval.c:3771) | -&gt;94.69% (548,309,283B) 0x4CC2A2: PyEval_EvalFrameEx (ceval.c:2412) | -&gt;94.69% (548,309,283B) 0x4CEBB3: PyEval_EvalCodeEx (ceval.c:3000) | -&gt;94.69% (548,309,283B) 0x5A0DC6: function_call (funcobject.c:524) | -&gt;94.69% (548,309,283B) 0x4261E8: PyObject_Call (abstract.c:2492) | -&gt;94.69% (548,309,283B) 0x4D2870: ext_do_call (ceval.c:4063) | -&gt;94.69% (548,309,283B) 0x4CC4E3: PyEval_EvalFrameEx (ceval.c:2452) | -&gt;94.69% (548,309,283B) 0x4CEBB3: PyEval_EvalCodeEx (ceval.c:3000) | -&gt;94.69% (548,309,283B) 0x5A0DC6: function_call (funcobject.c:524) | -&gt;94.69% (548,309,283B) 0x4261E8: PyObject_Call (abstract.c:2492) | -&gt;94.69% (548,309,283B) 0x4D2870: ext_do_call (ceval.c:4063) | -&gt;94.69% (548,309,283B) 0x4CC4E3: PyEval_EvalFrameEx (ceval.c:2452) | -&gt;94.69% (548,309,283B) 0x4CEBB3: PyEval_EvalCodeEx (ceval.c:3000) | -&gt;94.69% (548,309,283B) 0x5A0DC6: function_call (funcobject.c:524) | -&gt;94.69% (548,309,283B) 0x4261E8: PyObject_Call (abstract.c:2492) </code></pre>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload