Note that there are some explanatory texts on larger screens.

plurals
  1. POApache Bench shows high 'connect' time
    primarykey
    data
    text
    <p>I'm trying to do some really simple benchmarking on our API server with apache bench, I know it's not an ideal tool to load-test webservers, but that's not the point.</p> <p>We have a couple of servers and a couple of VMs (hosted on those servers). All of them running the same stack of nginx, uwsgi and a cherrypy webapp. I can apachebench on a new server (no VM), and the results are as follows:</p> <pre><code>Server Software: CherryPy/3.2.2 Server Hostname: 10.0.2.4 Server Port: 5000 Document Path: /api/v4/ping Document Length: 245 bytes Concurrency Level: 1 Time taken for tests: 10.100 seconds Complete requests: 1000 Failed requests: 0 Write errors: 0 Total transferred: 397000 bytes HTML transferred: 245000 bytes Requests per second: 99.01 [#/sec] (mean) Time per request: 10.100 [ms] (mean) Time per request: 10.100 [ms] (mean, across all concurrent requests) Transfer rate: 38.38 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 1 4 54.7 1 1002 Processing: 3 6 2.0 6 34 Waiting: 3 6 2.0 6 34 Total: 4 10 54.8 7 1008 Percentage of the requests served within a certain time (ms) 50% 7 66% 8 75% 8 80% 8 90% 9 95% 9 98% 11 99% 12 100% 1008 (longest request) </code></pre> <p>The weird thing here is that requests are mostly served within 12ms, as you can see. But some requests are taking over ten times longer. Apparently the problem is in the 'connect' phase, whatever that is.</p> <p>I can reliably reproduce this effect from any server or VM querying any other server or VM. But, queries against localhost are always fast.</p> <p>Also, the slows requests seem to be uniformly distributed, even when executing only 50 requests with ab (concurrency setting 1), I get some slow requests (sometimes it takes a couple of attempts though).</p> <p>To further pinpoint the problem I know just benchmark against nginx's hello world page which shows the exact same behavior (so it's not our app).</p> <p>When I first noticed this problem I also noticed a huge number of 'TIME_WAIT' connections, I tried tweaking some sysctl flags, the TIME_WAITs are now a lot less, but hte problem persists.</p> <p>I've tried all manner of sysctl settings to no avail.</p> <p>All servers are moderately new machines with Intel processors, they all run Ubuntu server, but different versions: 10.10, 12.10 and 13.04.</p> <p>For completeness sake, I tried to run the same benchmark against nginx's frontpage on two Google Compute Engine VMs, they do not display the same behavior.</p> <p>Any help would be appreciated, if any information is missing I'd gladly amend my post :).</p> <p>Edit:</p> <p>After syncing the clocks with ntpdate I ran a Python script that just opens a socket from host-3 to host-4 and closes it again. If the process took longer then 0.5s then the exact timeframe is logged. I ran this while running tcpdump on both servers, and this is some of the output, notice that at some point it takes about a second to transfer some packets. Problem is, I don't really know how to interpret these results.</p> <p>Host 3:</p> <pre><code>13:24:59.591046 IP (tos 0x0, ttl 64, id 17736, offset 0, flags [DF], proto TCP (6), length 60) sm-host-3.45573 &gt; sm-host-4.5000: Flags [S], cksum 0x3b59 (incorrect -&gt; 0x69fa), seq 2690430222, win 14480, options [mss 1460,sackOK,TS val 406581318 ecr 90980708,nop,wscale 7], length 0 13:24:59.591110 IP (tos 0x0, ttl 60, id 2866, offset 0, flags [DF], proto TCP (6), length 52) sm-host-4.5000 &gt; sm-host-3.45572: Flags [F.], cksum 0x4d59 (correct), seq 1, ack 2, win 114, options [nop,nop,TS val 90980709 ecr 406581318], length 0 13:24:59.591127 IP (tos 0x0, ttl 64, id 30162, offset 0, flags [DF], proto TCP (6), length 52) sm-host-3.45572 &gt; sm-host-4.5000: Flags [.], cksum 0x3b51 (incorrect -&gt; 0x4d59), seq 2, ack 2, win 114, options [nop,nop,TS val 406581318 ecr 90980709], length 0 13:25:00.588453 IP (tos 0x0, ttl 64, id 17737, offset 0, flags [DF], proto TCP (6), length 60) sm-host-3.45573 &gt; sm-host-4.5000: Flags [S], cksum 0x3b59 (incorrect -&gt; 0x6900), seq 2690430222, win 14480, options [mss 1460,sackOK,TS val 406581568 ecr 90980708,nop,wscale 7], length 0 13:25:00.588766 IP (tos 0x0, ttl 60, id 0, offset 0, flags [DF], proto TCP (6), length 60) sm-host-4.5000 &gt; sm-host-3.45573: Flags [S.], cksum 0x4168 (correct), seq 2011803549, ack 2690430223, win 14480, options [mss 1460,sackOK,TS val 90980958 ecr 406581318,nop,wscale 7], length 0 13:25:00.588814 IP (tos 0x0, ttl 64, id 17738, offset 0, flags [DF], proto TCP (6), length 52) sm-host-3.45573 &gt; sm-host-4.5000: Flags [.], cksum 0x3b51 (incorrect -&gt; 0xa758), seq 1, ack 1, win 114, options [nop,nop,TS val 406581568 ecr 90980958], length 0 13:25:00.588821 IP (tos 0x0, ttl 60, id 0, offset 0, flags [DF], proto TCP (6), length 60) sm-host-4.5000 &gt; sm-host-3.45573: Flags [S.], cksum 0x4168 (correct), seq 2011803549, ack 2690430223, win 14480, options [mss 1460,sackOK,TS val 90980958 ecr 406581318,nop,wscale 7], length 0 </code></pre> <p>Host-4 (around the same time frame):</p> <pre><code>13:24:59.587624 IP (tos 0x0, ttl 60, id 11, offset 0, flags [DF], proto TCP (6), length 52) sm-host-3.45569 &gt; sm-host-4.5000: Flags [.], cksum 0x54c4 (correct), seq 2, ack 2, win 114, options [nop,nop,TS val 406581318 ecr 90980708], length 0 13:24:59.587630 IP (tos 0x0, ttl 60, id 30162, offset 0, flags [DF], proto TCP (6), length 52) sm-host-3.45572 &gt; sm-host-4.5000: Flags [.], cksum 0x4d59 (correct), seq 2, ack 2, win 114, options [nop,nop,TS val 406581318 ecr 90980709], length 0 13:25:00.585005 IP (tos 0x0, ttl 60, id 17736, offset 0, flags [DF], proto TCP (6), length 60) sm-host-3.45573 &gt; sm-host-4.5000: Flags [S], cksum 0x69fa (correct), seq 2690430222, win 14480, options [mss 1460,sackOK,TS val 406581318 ecr 90980708,nop,wscale 7], length 0 13:25:00.585026 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60) sm-host-4.5000 &gt; sm-host-3.45573: Flags [S.], cksum 0x4168 (correct), seq 2011803549, ack 2690430223, win 14480, options [mss 1460,sackOK,TS val 90980958 ecr 406581318,nop,wscale 7], length 0 13:25:00.585032 IP (tos 0x0, ttl 60, id 17737, offset 0, flags [DF], proto TCP (6), length 60) sm-host-3.45573 &gt; sm-host-4.5000: Flags [S], cksum 0x6900 (correct), seq 2690430222, win 14480, options [mss 1460,sackOK,TS val 406581568 ecr 90980708,nop,wscale 7], length 0 13:25:00.585039 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60) sm-host-4.5000 &gt; sm-host-3.45573: Flags [S.], cksum 0x4168 (correct), seq 2011803549, ack 2690430223, win 14480, options [mss 1460,sackOK,TS val 90980958 ecr 406581318,nop,wscale 7], length 0 13:25:00.585331 IP (tos 0x0, ttl 60, id 17738, offset 0, flags [DF], proto TCP (6), length 52) sm-host-3.45573 &gt; sm-host-4.5000: Flags [.], cksum 0xa758 (correct), seq 1, ack 1, win 114, options [nop,nop,TS val 406581568 ecr 90980958], length 0 </code></pre>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload