Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>first of all I would suggest to avoid urllib like the plague and instead use requests, which has really easy support for proxies: <a href="http://docs.python-requests.org/en/latest/user/advanced/#proxies" rel="nofollow">http://docs.python-requests.org/en/latest/user/advanced/#proxies</a><br> Next to that, I haven't use it with multi-threading but with multi-processing and that worked really well, the only thing that you have to figure out is whether you have a dynamic queue or a fairly fixed list that you can spread over workers, an example of the latter which spreads a list of urls evenly over x processes:</p> <pre><code># *** prepare multi processing nr_processes = 4 chunksize = int(math.ceil(total_nr_urls / float(nr_processes))) procs = [] # *** start up processes for i in range(nr_processes): start_row = chunksize * i end_row = min(chunksize * (i + 1), total_nr_store) p = multiprocessing.Process( target=url_loop, args=(start_row, end_row, str(i), job_id_input)) procs.append(p) p.start() # *** Wait for all worker processes to finish for p in procs: p.join() </code></pre> <p>every url_loop process writes away its own sets of data to tables in a database, so I don't have to worry about joining it together in python.</p> <p>Edit: On sharing data between processes -> For details see: <a href="http://docs.python.org/2/library/multiprocessing.html?highlight=multiprocessing#multiprocessing" rel="nofollow">http://docs.python.org/2/library/multiprocessing.html?highlight=multiprocessing#multiprocessing</a></p> <pre><code>from multiprocessing import Process, Value, Array def f(n, a): n.value = 3.1415927 for i in range(len(a)): a[i] = -a[i] if __name__ == '__main__': num = Value('d', 0.0) arr = Array('i', range(10)) p = Process(target=f, args=(num, arr)) p.start() p.join() print num.value print arr[:] </code></pre> <p>But as you see, basically these special types (Value &amp; Array) enable sharing of data between the processes. If you instead look for a queue to do a roundrobin like process, you can use JoinableQueue. Hope this helps!</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload