Note that there are some explanatory texts on larger screens.

plurals
  1. POHow to include the start url in the "allow" rule in SgmlLinkExtractor using a scrapy crawl spider
    primarykey
    data
    text
    <p>I have searched a lot of topics but does not seem to find the answer for my specific question. I have created a crawl spider for a website and it works perfectly. I then made a similar one to crawl a similar website but this time I have a small issue. Down to the business:</p> <p>my start url looks as follows: www.example.com . The page contains the links I want to apply my spider look like: </p> <ul> <li>www.example.com/locationA</li> <li>www.example.com/locationB</li> <li>www.example.com/locationC</li> </ul> <p>...</p> <p>I now have a issue: Every time when I enter the start url, it redirects to www.example.com/locationA automatically and all links I got my spider working include </p> <ul> <li>www.example.com/locationB</li> <li>www.example.com/locationC ...</li> </ul> <p>So my problem is how I can include the www.example.com/locationA in the returned URLs.I even got the log info like:</p> <p>-2011-11-28 21:25:33+1300 [example.com] DEBUG: Redirecting (302) to from http://www.example.com/></p> <p>-2011-11-28 21:25:34+1300[example.com] DEBUG: Redirecting (302) to (referer: None)</p> <ul> <li>2011-11-28 21:25:37+1300 [example.com] DEBUG: Redirecting (302) to (referer: www.example.com/locationB)</li> </ul> <p>Print out from parse_item: www.example.com/locationB</p> <p>....</p> <p>I think the issue might be related to that (referer: None) some how. Could anyone please shed some light on this??</p> <p>I have narrow down this issue by changing the start url to www.example.com/locationB. Since all the pages contain the lists of all locations, this time I got my spider works on:</p> <p>-www.example.com/locationA</p> <p>-www.example.com/locationC ...</p> <p>In a nut shell, I am looking for the way to include the url which is same as (or being redirected from) the start url into the list that the parse_item callback will work on.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload