Note that there are some explanatory texts on larger screens.

plurals
  1. POWhy python multi-threading and Queue does not help accelerate reading a big number of files?
    text
    copied!<p>I am writing a Python program to read about 110000+ text files from local file system and push them into MongoDB. Here is my code snippet.</p> <p>class EmailProducer (threading.Thread):</p> <pre><code>def __init__(self, threadID, queue, path): self.threadID = threadID self.queue = queue self.path = path threading.Thread.__init__(self) def run(self): if (queue.empty()): files = os.listdir(self.path) print(len(files)) for file in files: queue.put(file) </code></pre> <p>class EmailConsumer (threading.Thread):</p> <pre><code>def __init__(self, threadID, queue, path, mongoConn): self.threadID = threadID self.queue = queue self.mongoConn = mongoConn self.path = path threading.Thread.__init__(self) def run(self): while (True): if (queue.empty()): mongoConn.close() break file = queue.get() self.mongoConn.persist(self.path, file) </code></pre> <p>The EmailProducer instance reads files from local filesystem and store them in the queue if the queue is empty; and the EmailConsumer instance fetch files from the queue and push them into Mongo. I also wrote a sequential version of the same functionality. I run both on my ubuntu 12.04 32 bit desktop with an i-5 quad-core processor and timed both of them. The multithreaded version started with 1 producer and 7 consumer. However, both of them cost approximately 23.7 sec real time and 21.7 sec user time. I thought threading would help here, but numbers told me it does not help. </p> <p>Any one has any insightful thoughts on the reason ? </p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload