Note that there are some explanatory texts on larger screens.

plurals
  1. PORandom unicorn timeouts and flapping
    primarykey
    data
    text
    <p>I'm running a pretty high-traffic Rails 3.2 app on Unicorn and Nginx (multiple web nodes), but every once in a while I will see Unicorn workers start timing out and getting sigkilled by the Unicorn master across all nodes. Of course, when a Unicorn worker gets sigkilled by the Unicorn master, a new worker gets forked in its place, but it also just hangs for 60 seconds then times out and gets killed. This basically happens repeatedly until I hard kill all Unicorn masters and workers.</p> <p><code>Unicorn log:</code></p> <pre><code>E, [2013-04-18T12:57:50.007623 #14002] ERROR -- : worker=8 PID:14968 timeout (62s &gt; 60s), killing E, [2013-04-18T12:57:50.108364 #14002] ERROR -- : reaped #&lt;Process::Status: pid 14968 SIGKILL (signal 9)&gt; worker=8 I, [2013-04-18T12:57:50.489505 #15726] INFO -- : worker=8 ready E, [2013-04-18T12:57:52.175842 #14002] ERROR -- : worker=5 PID:15033 timeout (61s &gt; 60s), killing E, [2013-04-18T12:57:52.276586 #14002] ERROR -- : reaped #&lt;Process::Status: pid 15033 SIGKILL (signal 9)&gt; worker=5 I, [2013-04-18T12:57:52.653069 #15782] INFO -- : worker=5 ready E, [2013-04-18T12:57:56.340290 #14002] ERROR -- : worker=3 PID:15074 timeout (61s &gt; 60s), killing E, [2013-04-18T12:57:56.440993 #14002] ERROR -- : reaped #&lt;Process::Status: pid 15074 SIGKILL (signal 9)&gt; worker=3 I, [2013-04-18T12:57:56.809730 #15832] INFO -- : worker=3 ready E, [2013-04-18T12:57:57.504142 #14002] ERROR -- : worker=7 PID:15087 timeout (61s &gt; 60s), killing E, [2013-04-18T12:57:57.604886 #14002] ERROR -- : reaped #&lt;Process::Status: pid 15087 SIGKILL (signal 9)&gt; worker=7 I, [2013-04-18T12:57:57.983581 #15845] INFO -- : worker=7 ready E, [2013-04-18T12:57:59.669664 #14002] ERROR -- : worker=4 PID:15108 timeout (61s &gt; 60s), killing E, [2013-04-18T12:57:59.770427 #14002] ERROR -- : reaped #&lt;Process::Status: pid 15108 SIGKILL (signal 9)&gt; worker=4 I, [2013-04-18T12:58:00.155461 #15879] INFO -- : worker=4 ready E, [2013-04-18T12:58:06.839906 #14002] ERROR -- : worker=9 PID:15192 timeout (61s &gt; 60s), killing E, [2013-04-18T12:58:06.940829 #14002] ERROR -- : reaped #&lt;Process::Status: pid 15192 SIGKILL (signal 9)&gt; worker=9 I, [2013-04-18T12:58:07.302766 #15956] INFO -- : worker=9 ready E, [2013-04-18T12:58:08.003330 #14002] ERROR -- : worker=6 PID:15213 timeout (61s &gt; 60s), killing E, [2013-04-18T12:58:08.104006 #14002] ERROR -- : reaped #&lt;Process::Status: pid 15213 SIGKILL (signal 9)&gt; worker=6 I, [2013-04-18T12:58:08.466790 #15973] INFO -- : worker=6 ready </code></pre> <p>Monitoring systems show that external services (Postgres database, Memcached, Redis) are all responding properly and without latency issues. </p> <p>Here are some outputs that may be of value:</p> <p>During these outages I notice a huge backlog of attempted connections to the Unicorn socket. When the site isn't down, usually the following command returns one or two lines only.</p> <p><code>netstat | grep unic</code></p> <pre><code>.... unix 2 [ ] STREAM CONNECTING 0 /tmp/unicorn.sock unix 2 [ ] STREAM CONNECTING 0 /tmp/unicorn.sock unix 2 [ ] STREAM CONNECTING 0 /tmp/unicorn.sock unix 2 [ ] STREAM CONNECTING 0 /tmp/unicorn.sock unix 2 [ ] STREAM CONNECTING 0 /tmp/unicorn.sock unix 2 [ ] STREAM CONNECTED 7768134 /tmp/unicorn.sock unix 2 [ ] STREAM CONNECTED 7767311 /tmp/unicorn.sock unix 2 [ ] STREAM CONNECTED 7766999 /tmp/unicorn.sock unix 2 [ ] STREAM CONNECTED 7767309 /tmp/unicorn.sock unix 2 [ ] STREAM CONNECTED 7766941 /tmp/unicorn.sock unix 2 [ ] STREAM CONNECTED 7767287 /tmp/unicorn.sock unix 2 [ ] STREAM CONNECTED 7766225 /tmp/unicorn.sock </code></pre> <p>Anyone have an idea what might be causing this? This happens across multiple servers, all at the same time.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload