Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>There are 2 ways to do things simultaneously. Or, really, 2-3/4 or so:</p> <ul> <li>Multiple threads <ul> <li>Or multiple processes, especially if the "things" take a lot of CPU power</li> <li>Or coroutines or greenlets, especially if there are thousands of "things"</li> <li>Or pools of one of the above</li> </ul></li> <li>Event loops (either coded manually) <ul> <li>Or hybrid greenlet/event loop systems like <code>gevent</code>.</li> </ul></li> </ul> <hr> <p>If you have 1000 URLs, you probably don't want to do 1000 requests at the same time. For example, web browsers typically only do something like 8 requests at a time. A pool is a nice way to do only 8 things at a time, so let's do that.</p> <p>And, since you're only doing 8 things at a time, and those things are primarily I/O bound, threads are perfect.</p> <hr> <p>I'll implement it with <a href="http://docs.python.org/3.3/library/concurrent.futures.html" rel="nofollow"><code>futures</code></a>. (If you're using Python 2.x, or 3.0-3.1, you will need to install the backport, <a href="https://pypi.python.org/pypi/futures" rel="nofollow"><code>futures</code></a>.)</p> <pre><code>import concurrent.futures urls = ['http://example.com/foo', 'http://example.com/bar'] with concurrent.futures.ThreadPoolExecutor(max_workers=8) as executor: result = b''.join(executor.map(download, urls)) with open('output_file', 'wb') as f: f.write(result) </code></pre> <hr> <p>Of course you need to write the <code>download</code> function, but that's exactly the same function you'd write if you were doing these one at a time.</p> <p>For example, using <a href="http://docs.python.org/3.3/library/urllib.request.html" rel="nofollow"><code>urlopen</code></a> (if you're using Python 2.x, use <code>urllib2</code> instead of <code>urllib.request</code>):</p> <pre><code>def download(url): with urllib.request.urlopen(url) as f: return f.read() </code></pre> <hr> <p>If you want to learn how to build a thread pool executor yourself, <a href="http://hg.python.org/cpython/file/3.3/Lib/concurrent/futures/thread.py" rel="nofollow">the source</a> is actually pretty simple, and <a href="http://hg.python.org/cpython/file/3.3/Lib/multiprocessing/pool.py" rel="nofollow"><code>multiprocessing.pool</code></a> is another nice example in the stdlib.</p> <p>However, both of those have a lot of excess code (handling weak references to improve memory usage, shutting down cleanly, offering different ways of waiting on the results, propagating exceptions properly, etc.) that may get in your way.</p> <p>If you look around PyPI and ActiveState, you will find simpler designs like <a href="https://pypi.python.org/pypi/threadpool" rel="nofollow"><code>threadpool</code></a> that you may find easier to understand.</p> <p>But here's the simplest joinable threadpool:</p> <pre><code>class ThreadPool(object): def __init__(self, max_workers): self.queue = queue.Queue() self.workers = [threading.Thread(target=self._worker) for _ in range(max_workers)] def start(self): for worker in self.workers: worker.start() def stop(self): for _ in range(self.workers): self.queue.put(None) for worker in self.workers: worker.join() def submit(self, job): self.queue.put(job) def _worker(self): while True: job = self.queue.get() if job is None: break job() </code></pre> <p>Of course the downside of a dead-simple implementation is that it's not as friendly to use as <code>concurrent.futures.ThreadPoolExecutor</code>:</p> <pre><code>urls = ['http://example.com/foo', 'http://example.com/bar'] results = [list() for _ in urls] results_lock = threading.Lock() def download(url, i): with urllib.request.urlopen(url) as f: result = f.read() with results_lock: results[i] = url pool = ThreadPool(max_workers=8) pool.start() for i, url in enumerate(urls): pool.submit(functools.partial(download, url, i)) pool.stop() result = b''.join(results) with open('output_file', 'wb') as f: f.write(result) </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload