Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Most linux distributions have <a href="http://en.wikipedia.org/wiki/Ephemeral_port" rel="nofollow">28232 ephemeral ports available</a>. A server needs one ephemeral port for each connection in order to free up the primary port (i.e. http server port 80) for the new connections.</p> <p>So, it would seem if the server is handling 1000 requests/sec for content generated by php5-fpm over TCP, you are allocating 2000 ports/sec. This is not really the case, it is likely 5% PHP and 95% static (no port allocation) and IIRC nginx&lt;->php-fpm keeps ports open for subsequent requests. There are lots of factors what can affect these numbers, but for arguments sake, lets say 1000 port allocations/sec.</p> <p>On the surface this does not seem like a problem, but by default ports are not immediately released and made available for new connections. There are various reasons for this behavior, and I highly recommend a thorough understanding of <a href="http://en.wikipedia.org/wiki/Transmission_Control_Protocol" rel="nofollow">TCP</a> before arbitrarily making changes detailed here (or anywhere else).</p> <p>Primarily a connection state called TIME_WAIT (socket is waiting after close to handle packets still in the network, <a href="http://linux.die.net/man/8/netstat" rel="nofollow">netstat man page</a>) is what holds up ports from being released for reuse. On recent (all?) linux kernels TIME_WAIT is hard-coded to 60 seconds, and according to <a href="http://tools.ietf.org/html/rfc793" rel="nofollow">RFC793</a> a connection may stay in TIME_WAIT up to four minutes!</p> <p><strong>This means at least 1000 ports will be in use for at least 60 seconds</strong>. In the real world, you need to account for transit time, keep-alive requests (multiple requests use the same connection), and service ports (between nginx and backend server). Lets arbitrarily knock it down to 750 ports/sec.</p> <p>In ~37 seconds all your available ports will be used up (28232 / 750 = 37). That's a problem, because it takes 60 seconds to release a port!</p> <p>To see all the ports in use, run <a href="http://httpd.apache.org/docs/2.2/programs/ab.html" rel="nofollow">apache bench</a> or something similar that can generate the number of requests per second you are tuning for. Then run:</p> <pre><code>root:~# netstat -n -t -o | grep timewait </code></pre> <p>You'll get output like (but many, many more lines):</p> <pre><code>tcp 0 0 127.0.0.1:40649 127.1.0.2:80 TIME_WAIT timewait (57.58/0/0) tcp 0 0 127.1.0.1:9000 127.0.0.1:50153 TIME_WAIT timewait (57.37/0/0) tcp 0 0 127.0.0.1:40666 127.1.0.2:80 TIME_WAIT timewait (57.69/0/0) tcp 0 0 127.0.0.1:40650 127.1.0.2:80 TIME_WAIT timewait (57.58/0/0) tcp 0 0 127.0.0.1:40662 127.1.0.2:80 TIME_WAIT timewait (57.69/0/0) tcp 0 0 127.0.0.1:40663 127.1.0.2:80 TIME_WAIT timewait (57.69/0/0) tcp 0 0 127.0.0.1:40661 127.1.0.2:80 TIME_WAIT timewait (57.61/0/0) </code></pre> <p>For a running total of allocated ports:</p> <pre><code>root:~# netstat -n -t -o | wc -l </code></pre> <p>If you're receiving failed requests, the number will be at/close to 28232.</p> <p><strong>How to solve the problem?</strong></p> <ol> <li><p>Increase the number of ephemeral ports from 28232 to 63976.</p> <pre><code>sysctl -w net.ipv4.ip_local_port_range="1024 65000" </code></pre></li> <li><p>Allow linux to reuse TIME_WAIT ports before the timeout expires.</p> <pre><code>sysctl -w net.ipv4.tcp_tw_reuse="1" </code></pre></li> <li><p>Additional IP addresses.</p></li> </ol>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload