StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PODjango long running asynchronous tasks with threads/processing
text
Body
copied!<p><strong>Disclaimer</strong>: I do know that there are several similar questions on SO. I think I've read most if not all of them, but did not find an answer to my real question (see later). I also do know that using celery or other asynchronous queue systems is the best way to achieve long running tasks - or at least use a cron-managed script. There's also <a href="http://code.google.com/p/modwsgi/wiki/ProcessesAndThreading" rel="noreferrer">mod_wsgi doc about processes and threads</a> but I'm not sure I got it all correct.</p> <p>The question is: </p> <p>what are the exact risks/issues involved with using the solutions listed down there? Is any of them viable for long running tasks (ok, even though celery is better suited)? My question is really more about understanding the internals of wsgi and python/django than finding the best overall solution. Issues with blocking threads, unsafe access to variables, zombie processing, etc.</p> <p>Let's say:</p> <ol> <li>my "long_process" is doing something really safe. even if it fails i don't care.</li> <li>python >= 2.6</li> <li>I'm using mod_wsgi with apache (will anything change with uwsgi or gunicorn?) in daemon mode </li> </ol> <p>mod_wsgi conf:</p> <pre><code>WSGIDaemonProcess NAME user=www-data group=www-data threads=25 WSGIScriptAlias / /path/to/wsgi.py WSGIProcessGroup %{ENV:VHOST} </code></pre> <p>I figured that these are the options available to launch separate <em>processes</em> (meant in a broad sense) to carry on a long running task while returning quickly a response to the user:</p> <h2>os.fork</h2> <pre><code>import os if os.fork()==0: long_process() else: return HttpResponse() </code></pre> <h2>subprocess</h2> <pre><code>import subprocess p = subprocess.Popen([sys.executable, '/path/to/script.py'], stdout=subprocess.PIPE, stderr=subprocess.STDOUT) </code></pre> <p>(where the script is likely to be a manage.py command)</p> <h2>threads</h2> <pre><code>import threading t = threading.Thread(target=long_process, args=args, kwargs=kwargs) t.setDaemon(True) t.start() return HttpResponse() </code></pre> <p>NB.</p> <blockquote> <p>Due to the Global Interpreter Lock, in CPython only one thread can execute Python code at once (even though certain performance-oriented libraries might overcome this limitation). If you want your application to make better of use of the computational resources of multi-core machines, you are advised to use multiprocessing. However, threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously.</p> </blockquote> <p>The main thread will quickly return (the httpresponse). Will the spawned long thread block wsgi from doing something else for another request?! </p> <h2>multiprocessing</h2> <pre><code>from multiprocessing import Process p = Process(target=_bulk_action,args=(action,objs)) p.start() return HttpResponse() </code></pre> <p>This should solve the thread concurrency issue, shouldn't it?</p> <hr> <p>So those are the options I could think of. What would work and what not, and why?</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload