Note that there are some explanatory texts on larger screens.

plurals
  1. POPython urllib3 and proxy
    text
    copied!<p>I am trying to figure out how to use proxy and multithreading.</p> <p>This code works:</p> <pre><code>requester = urllib3.PoolManager(maxsize = 10, headers = self.headers) thread_pool = workerpool.WorkerPool() thread_pool.map(grab_wrapper, [item['link'] for item in products]) thread_pool.shutdown() thread_pool.wait() </code></pre> <p>Then in <code>grab_wrapper</code></p> <pre><code>requested_page = requester.request('GET', url, assert_same_host = False, headers = self.headers) </code></pre> <p>Headers consist of: Accept, Accept-Charset, Accept-Encoding, Accept-Language and User-Agent</p> <p>But this does not work in production, since it has to pass proxy, no authorization is required. </p> <p>I tried different things (passing <code>proxies</code> to request, in headers, etc.). The only thing that works is this:</p> <pre><code>requester = urllib3.proxy_from_url(self._PROXY_URL, maxsize = 7, headers = self.headers) thread_pool = workerpool.WorkerPool(size = 10) thread_pool.map(grab_wrapper, [item['link'] for item in products]) thread_pool.shutdown() thread_pool.wait() </code></pre> <p>Now, when I run the program, it will make 10 requests (10 threads) and then... stop. No error, no warning whatsoever. This is the only way I can bypass proxy, but it seems like its not possible to use <code>proxy_from_url</code> and <code>WorkerPool</code> together.</p> <p>Any ideas how to combine those two into a working code? I would rather avoid rewriting it into scrappy, etc. due to time limitation</p> <p>Regards</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload