Note that there are some explanatory texts on larger screens.

plurals
  1. POPython urllib3 and proxy
    primarykey
    data
    text
    <p>I am trying to figure out how to use proxy and multithreading.</p> <p>This code works:</p> <pre><code>requester = urllib3.PoolManager(maxsize = 10, headers = self.headers) thread_pool = workerpool.WorkerPool() thread_pool.map(grab_wrapper, [item['link'] for item in products]) thread_pool.shutdown() thread_pool.wait() </code></pre> <p>Then in <code>grab_wrapper</code></p> <pre><code>requested_page = requester.request('GET', url, assert_same_host = False, headers = self.headers) </code></pre> <p>Headers consist of: Accept, Accept-Charset, Accept-Encoding, Accept-Language and User-Agent</p> <p>But this does not work in production, since it has to pass proxy, no authorization is required. </p> <p>I tried different things (passing <code>proxies</code> to request, in headers, etc.). The only thing that works is this:</p> <pre><code>requester = urllib3.proxy_from_url(self._PROXY_URL, maxsize = 7, headers = self.headers) thread_pool = workerpool.WorkerPool(size = 10) thread_pool.map(grab_wrapper, [item['link'] for item in products]) thread_pool.shutdown() thread_pool.wait() </code></pre> <p>Now, when I run the program, it will make 10 requests (10 threads) and then... stop. No error, no warning whatsoever. This is the only way I can bypass proxy, but it seems like its not possible to use <code>proxy_from_url</code> and <code>WorkerPool</code> together.</p> <p>Any ideas how to combine those two into a working code? I would rather avoid rewriting it into scrappy, etc. due to time limitation</p> <p>Regards</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload