Note that there are some explanatory texts on larger screens.

plurals
  1. POUsing Scrapy to parse site, follow Next Page, write as XML
    primarykey
    data
    text
    <p>My script works wonderfully when I comment one piece of code: <strong>return items</strong>.</p> <p>Here is my code, changing to <a href="http://example.com" rel="nofollow">http://example.com</a> since it appears that is what other people to possibly to preserve the 'scraping' legality issues.</p> <pre><code>class Vfood(CrawlSpider): name = "example.com" allowed_domains = [ "example.com" ] start_urls = [ "http://www.example.com/TV_Shows/Show/Episodes" ] rules = ( Rule(SgmlLinkExtractor(allow=('example\.com', 'page='), restrict_xpaths = '//div[@class="paginator"]/ span[@id="next"]'), callback='parse'), ) def parse(self, response): hxs = HtmlXPathSelector(response) items = [] countries = hxs.select('//div[@class="index-content"]') tmpNextPage = hxs.select('//div[@class="paginator"]/span[@id="next"]/a/@href').extract() for country in countries: item = FoodItem() countryName = country.select('.//h3/text()').extract() item['country'] = countryName print "Country Name: ", countryName shows = country.select('.//div[@class="content1"]') for show in shows.select('.//div'): showLink = (show.select('.//h4/a/@href').extract()).pop() showLocation = show.select('.//h4/a/text()').extract() showText = show.select('.//p/text()').extract() item['showURL'] = "http://www.travelchannel.com"+str(showLink) item['showcity'] = showLocation item['showtext'] = showText item['showtext'] = showText print "\t", showLink print "\t", showLocation print "\t", showText print "\n" items.append(item) **#return items** for NextPageLink in tmpNextPage: m = re.search("Location", NextPageLink) if m: NextPage = NextPageLink print "Next Page: ", NextPage yield Request("http://www.example.com/"+NextPage, callback = self.parse) else: NextPage = 'None' SPIDER = food() </code></pre> <p>If I UNCOMMENT the #return items, I get the following error: </p> <pre><code>yield Request("http://www.example.com/"+NextPage, callback = self.parse) SyntaxError: 'return' with argument inside generator </code></pre> <p>By leaving the comment there, I am unable to collect the data in XML format, but by the result of the print statements, I do see everything that I am supposed to on the screen.</p> <p>my command for getting xml out:</p> <pre><code>scrapy crawl example.com --set FEED_URI=food.xml --set FEED_FORMAT=xml </code></pre> <p>I get the XML file creation when I UNCOMMENT the <strong><em>return items</em></strong> line above, but the script stops and won't follow the links.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload