Note that there are some explanatory texts on larger screens.

plurals
  1. POpython -> multiprocessing module
    text
    copied!<p>Here's what I am trying to accomplish - </p> <ol> <li>I have about a million files which I need to parse &amp; append the parsed content to a single file.</li> <li>Since a single process takes ages, this option is out.</li> <li>Not using threads in Python as it essentially comes to running a single process (due to GIL).</li> <li>Hence using multiprocessing module. i.e. spawning 4 sub-processes to utilize all that raw core power :)</li> </ol> <p>So far so good, now I need a shared object which all the sub-processes have access to. I am using Queues from the multiprocessing module. Also, all the sub-processes need to write their output to a single file. A potential place to use Locks I guess. With this setup when I run, I do not get any error (so the parent process seems fine), it just stalls. When I press ctrl-C I see a traceback (one for each sub-process). Also no output is written to the output file. Here's code (note that everything runs fine without multi-processes) - </p> <pre><code>import os import glob from multiprocessing import Process, Queue, Pool data_file = open('out.txt', 'w+') def worker(task_queue): for file in iter(task_queue.get, 'STOP'): data = mine_imdb_page(os.path.join(DATA_DIR, file)) if data: data_file.write(repr(data)+'\n') return def main(): task_queue = Queue() for file in glob.glob('*.csv'): task_queue.put(file) task_queue.put('STOP') # so that worker processes know when to stop # this is the block of code that needs correction. if multi_process: # One way to spawn 4 processes # pool = Pool(processes=4) #Start worker processes # res = pool.apply_async(worker, [task_queue, data_file]) # But I chose to do it like this for now. for i in range(4): proc = Process(target=worker, args=[task_queue]) proc.start() else: # single process mode is working fine! worker(task_queue) data_file.close() return </code></pre> <p>what am I doing wrong? I also tried passing the open file_object to each of the processes at the time of spawning. But to no effect. e.g.- <code>Process(target=worker, args=[task_queue, data_file])</code>. But this did not change anything. I feel the subprocesses are not able to write to the file for some reason. Either the instance of the <code>file_object</code> is not getting replicated (at the time of spawn) or some other quirk... Anybody got an idea?</p> <p><strong>EXTRA:</strong> Also Is there any way to keep a persistent mysql_connection open &amp; pass it across to the sub_processes? So I open a mysql connection in my parent process &amp; the open connection should be accessible to all my sub-processes. Basically this is the equivalent of a shared_memory in python. Any ideas here?</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload