Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>Avoid PID-files, crons, or anything else that tries to evaluate processes that aren't their children.</p> <p>There is a very good reason why in UNIX, you can ONLY wait on your children. Any method (ps parsing, pgrep, storing a PID, ...) that tries to work around that is flawed and has gaping holes in it. Just say <strong>no</strong>.</p> <p>Instead you need the process that monitors your process to be the process' parent. What does this mean? It means only the process that <em>starts</em> your process can reliably wait for it to end. In bash, this is absolutely trivial.</p> <pre><code>until myserver; do echo "Server 'myserver' crashed with exit code $?. Respawning.." &gt;&amp;2 sleep 1 done </code></pre> <p>The above piece of bash code runs <code>myserver</code> in an <code>until</code> loop. The first line starts <code>myserver</code> and waits for it to end. When it ends, <code>until</code> checks its exit status. If the exit status is <code>0</code>, it means it ended gracefully (which means you asked it to shut down somehow, and it did so successfully). In that case we don't want to restart it (we just asked it to shut down!). If the exit status is <em>not</em> <code>0</code>, <code>until</code> will run the loop body, which emits an error message on STDERR and restarts the loop (back to line 1) <em>after 1 second</em>.</p> <p>Why do we wait a second? Because if something's wrong with the startup sequence of <code>myserver</code> and it crashes immediately, you'll have a very intensive loop of constant restarting and crashing on your hands. The <code>sleep 1</code> takes away the strain from that.</p> <p>Now all you need to do is start this bash script (asynchronously, probably), and it will monitor <code>myserver</code> and restart it as necessary. If you want to start the monitor on boot (making the server "survive" reboots), you can schedule it in your user's cron(1) with an <code>@reboot</code> rule. Open your cron rules with <code>crontab</code>:</p> <pre><code>crontab -e </code></pre> <p>Then add a rule to start your monitor script:</p> <pre><code>@reboot /usr/local/bin/myservermonitor </code></pre> <hr> <p>Alternatively; look at inittab(5) and /etc/inittab. You can add a line in there to have <code>myserver</code> start at a certain init level and be respawned automatically.</p> <hr> <p>Edit.</p> <p>Let me add some information on why <strong>not</strong> to use PID files. While they are very popular; they are also very flawed and there's no reason why you wouldn't just do it the correct way.</p> <p>Consider this:</p> <ol> <li><p>PID recycling (killing the wrong process):</p> <ul> <li><code>/etc/init.d/foo start</code>: start <code>foo</code>, write <code>foo</code>'s PID to <code>/var/run/foo.pid</code></li> <li>A while later: <code>foo</code> dies somehow.</li> <li>A while later: any random process that starts (call it <code>bar</code>) takes a random PID, imagine it taking <code>foo</code>'s old PID.</li> <li>You notice <code>foo</code>'s gone: <code>/etc/init.d/foo/restart</code> reads <code>/var/run/foo.pid</code>, checks to see if it's still alive, finds <code>bar</code>, thinks it's <code>foo</code>, kills it, starts a new <code>foo</code>.</li> </ul></li> <li><p>PID files go stale. You need over-complicated (or should I say, non-trivial) logic to check whether the PID file is stale, and any such logic is again vulnerable to <code>1.</code>.</p></li> <li><p>What if you don't even have write access or are in a read-only environment?</p></li> <li><p>It's pointless overcomplication; see how simple my example above is. No need to complicate that, at all.</p></li> </ol> <p>See also: <a href="https://stackoverflow.com/questions/25906020/are-pid-files-still-flawed-when-doing-it-right/25933330#25933330">Are PID-files still flawed when doing it &#39;right&#39;?</a></p> <p>By the way; <strong>even worse than PID files is parsing <code>ps</code>!</strong> Don't ever do this.</p> <ol> <li><code>ps</code> is very unportable. While you find it on almost every UNIX system; its arguments vary greatly if you want non-standard output. And standard output is ONLY for human consumption, not for scripted parsing!</li> <li>Parsing <code>ps</code> leads to a LOT of false positives. Take the <code>ps aux | grep PID</code> example, and now imagine someone starting a process with a number somewhere as argument that happens to be the same as the PID you stared your daemon with! Imagine two people starting an X session and you grepping for X to kill yours. It's just all kinds of bad.</li> </ol> <p>If you don't want to manage the process yourself; there are some perfectly good systems out there that will act as monitor for your processes. Look into <a href="http://smarden.org/runit/" rel="noreferrer">runit</a>, for example.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload