StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
17203525
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
0
CommunityOwnedDate
CreationDate
2013-06-20T00:31:16.443
FavoriteCount
0
LastActivityDate
2013-06-20T00:54:15.603
LastEditDate
2013-06-20T00:54:15.603
LastEditorUserId
908494
OwnerUserId
908494
ParentId
17203458
PostTypeId
2
Score
5
ViewCount
0
LastEditorDisplayName
text
Body
There are 2 ways to do things simultaneously. Or, really, 2-3/4 or so: <ul> <li>Multiple threads <ul> <li>Or multiple processes, especially if the "things" take a lot of CPU power</li> <li>Or coroutines or greenlets, especially if there are thousands of "things"</li> <li>Or pools of one of the above</li> </ul></li> <li>Event loops (either coded manually) <ul> <li>Or hybrid greenlet/event loop systems like <code>gevent</code>.</li> </ul></li> </ul> <hr> If you have 1000 URLs, you probably don't want to do 1000 requests at the same time. For example, web browsers typically only do something like 8 requests at a time. A pool is a nice way to do only 8 things at a time, so let's do that. And, since you're only doing 8 things at a time, and those things are primarily I/O bound, threads are perfect. <hr> I'll implement it with <a href="http://docs.python.org/3.3/library/concurrent.futures.html" rel="nofollow"><code>futures</code></a>. (If you're using Python 2.x, or 3.0-3.1, you will need to install the backport, <a href="https://pypi.python.org/pypi/futures" rel="nofollow"><code>futures</code></a>.) <pre><code>import concurrent.futures urls = ['http://example.com/foo', 'http://example.com/bar'] with concurrent.futures.ThreadPoolExecutor(max_workers=8) as executor: result = b''.join(executor.map(download, urls)) with open('output_file', 'wb') as f: f.write(result) </code></pre> <hr> Of course you need to write the <code>download</code> function, but that's exactly the same function you'd write if you were doing these one at a time. For example, using <a href="http://docs.python.org/3.3/library/urllib.request.html" rel="nofollow"><code>urlopen</code></a> (if you're using Python 2.x, use <code>urllib2</code> instead of <code>urllib.request</code>): <pre><code>def download(url): with urllib.request.urlopen(url) as f: return f.read() </code></pre> <hr> If you want to learn how to build a thread pool executor yourself, <a href="http://hg.python.org/cpython/file/3.3/Lib/concurrent/futures/thread.py" rel="nofollow">the source</a> is actually pretty simple, and <a href="http://hg.python.org/cpython/file/3.3/Lib/multiprocessing/pool.py" rel="nofollow"><code>multiprocessing.pool</code></a> is another nice example in the stdlib. However, both of those have a lot of excess code (handling weak references to improve memory usage, shutting down cleanly, offering different ways of waiting on the results, propagating exceptions properly, etc.) that may get in your way. If you look around PyPI and ActiveState, you will find simpler designs like <a href="https://pypi.python.org/pypi/threadpool" rel="nofollow"><code>threadpool</code></a> that you may find easier to understand. But here's the simplest joinable threadpool: <pre><code>class ThreadPool(object): def __init__(self, max_workers): self.queue = queue.Queue() self.workers = [threading.Thread(target=self._worker) for _ in range(max_workers)] def start(self): for worker in self.workers: worker.start() def stop(self): for _ in range(self.workers): self.queue.put(None) for worker in self.workers: worker.join() def submit(self, job): self.queue.put(job) def _worker(self): while True: job = self.queue.get() if job is None: break job() </code></pre> Of course the downside of a dead-simple implementation is that it's not as friendly to use as <code>concurrent.futures.ThreadPoolExecutor</code>: <pre><code>urls = ['http://example.com/foo', 'http://example.com/bar'] results = [list() for _ in urls] results_lock = threading.Lock() def download(url, i): with urllib.request.urlopen(url) as f: result = f.read() with results_lock: results[i] = url pool = ThreadPool(max_workers=8) pool.start() for i, url in enumerate(urls): pool.submit(functools.partial(download, url, i)) pool.stop() result = b''.join(results) with open('output_file', 'wb') as f: f.write(result) </code></pre>
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POhow to download multiple file simultaneously and join them in python?
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USabarnert
UserOwnerUserId
1. USabarnert
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. POhow to download multiple file simultaneously and join them in python?
 singulars
 PostTypePostTypeId
 PTQuestion
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTAcceptedByOriginator
CommentsPostId
1. This table or related slice is empty.

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.