Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Not sure if you still need this but I have put together an example. If you have a specific website in mind, we can all definitely take a look at it. </p> <pre><code>from scrapy.http import Request from scrapy.spider import BaseSpider class TestSpider(BaseSpider): name = "TEST" allowed_domains = ["example.com", "example.iana.org"] def __init__(self, **kwargs): super( TestSpider, self ).__init__(**kwargs)\ self.url = "http://www.example.com" self.max_loop = 3 self.loop = 0 # We want it to loop 3 times so keep a class var def start_requests(self): # I'll write it out more explicitly here print "OPEN" checkRequest = Request( url = self.url, meta = {"test":"first"}, callback = self.checker ) return [ checkRequest ] def checker(self, response): # I wasn't sure about a specific website that gives 302 # so I just used 200. We need the loop counter or it will keep going if(self.loop&lt;self.max_loop and response.status==200): print "RELOOPING", response.status, self.loop, response.meta['test'] self.loop += 1 checkRequest = Request( url = self.url, callback = self.checker ).replace(meta = {"test":"not first"}) return [checkRequest] else: print "END LOOPING" self.results(response) # No need to return, just call method def results(self, response): print "DONE" # Do stuff here </code></pre> <p>In settings.py, set this option</p> <pre><code>DUPEFILTER_CLASS = 'scrapy.dupefilter.BaseDupeFilter' </code></pre> <p>This is actually what turns off the filter for duplicate site requests. It's confusing because the BaseDupeFilter is not actually the default since it doesn't really filter anything. This means we will submit 3 different requests that will loop through the checker method. Also, I am using scrapy 0.16:</p> <pre><code>&gt;scrapy crawl TEST &gt;OPEN &gt;RELOOPING 200 0 first &gt;RELOOPING 200 1 not first &gt;RELOOPING 200 2 not first &gt;END LOOPING &gt;DONE </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload