StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POLAMP on CenOS 6 sporadic timeouts
text
Body
copied!<p>We have a few servers recently moved to new provider (well-known, Germany one). Configuration are same, those are i7-2600 CPU, 16Gb RAM machines, 1Gbit cards (conneted to router at 100Mbit)</p> <p>OS is Centos 6, Application is LAMP (Apache 2.2.15, PHP 5.3.8 (APC 3.1.9), MySQL 5.5.18, Memcached daemons running on each machine)</p> <p>PHP pages called by proxy-component written in Java (100-300 times/sec depends on users number) There is no any swapping on servers, no warnings in /var/log/messages, Load average is about 0.5-1.0 on application servers and 2.0 - 3.0 at MySQL. There no bottlenecks in application (we are gathering metrics, standart time needed for rendering responce always around 0.015 seconds) </p> <p>The problem is following: sporadically, we seeing timeouts in proxy-component going in row during 2-3 seconds. Often timeouts equals to 3000, sometimes 9000 and rarely to 21000 milliseconds (this is somehow connected to SYN-packets?) This even happens if proxy components placed on same machine with PHP-application (Apache+PHP)</p> <p>We also noticed that:</p> <ol> <li>threads on Mysql are during this timeouts have 'Reading from net' statuses. </li> <li>During timeouts Apache "status" page fills quickly (1-3 seconds) with 'W' processes (so all processes became in 'W', some in 'C' statuses)</li> <li>Timeouts mostly appears when traffic increasing (evening), and this problem disappears when traffic starts going down (evening->night)</li> <li>During timeouts Load average increases to 5.0 - 20.0</li> </ol> <p>Things which I tried and they do not help:</p> <ol> <li>I played a lot with sysctl/net variables (somaxconn, buffers, this does not help) </li> <li>Turning off firewall</li> <li>Turning off APC (disabled it's usage in code)</li> <li>Switching to persistent connections (in PHP) and from MySQL to MySQLi</li> </ol> <p>Just now I found that iperf showing drop down in bandwidth during timeouts:</p> <pre><code>------------------------------------------------------------ Client connecting to localhost, UDP port 5001 Sending 1470 byte datagrams UDP buffer size: 122 KByte (default) ------------------------------------------------------------ [ 3] local 127.0.0.1 port 54006 connected with 127.0.0.1 port 5001 [ ID] Interval Transfer Bandwidth ... [ 3] 266.0-266.5 sec 24.0 MBytes 402 Mbits/sec [ 3] 266.5-267.0 sec 24.4 MBytes 410 Mbits/sec [ 3] 267.0-267.5 sec 24.0 MBytes 402 Mbits/sec [ 3] 267.5-268.0 sec 24.4 MBytes 410 Mbits/sec [ 3] 268.0-268.5 sec 24.0 MBytes 402 Mbits/sec [ 3] 268.5-269.0 sec 18.6 MBytes 312 Mbits/sec [ 3] 269.0-269.5 sec 2.42 MBytes 40.6 Mbits/sec [ 3] 269.5-270.0 sec 7.87 MBytes 132 Mbits/sec [ 3] 270.0-270.5 sec 2.30 MBytes 38.5 Mbits/sec [ 3] 270.5-271.0 sec 2.84 MBytes 47.7 Mbits/sec [ 3] 271.0-271.5 sec 5.59 MBytes 93.8 Mbits/sec [ 3] 271.5-272.0 sec 3.42 MBytes 57.4 Mbits/sec [ 3] 272.0-272.5 sec 2.83 MBytes 47.5 Mbits/sec [ 3] 272.5-273.0 sec 13.5 MBytes 227 Mbits/sec [ 3] 273.0-273.5 sec 24.2 MBytes 407 Mbits/sec [ 3] 273.5-274.0 sec 24.1 MBytes 404 Mbits/sec [ 3] 274.0-274.5 sec 24.3 MBytes 408 Mbits/sec ... </code></pre> <p>Notice, that only iperf client was launched with "iperf -c localhost -i0.5 -b5000000000 -t3000" command.</p> <p>What is the issue which leads to such timeouts? Is this CentOS-related ?</p> <p>Thanks, Arsen</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload