Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>To elaborate on my earlier comment, here's a sample spider that has 2 callbacks:</p> <ul> <li><code>parse()</code> would contain the logic you already have for the "main" page, yielding items from the main page, and also yielding fetch requests for the detail page for each item</li> <li><code>parse_detail_page()</code> would have a different scraping logic, with other selectors, and yielding another class of Items</li> </ul> <p>:</p> <pre><code>from scrapy.selector import HtmlXPathSelector from scrapy.spider import BaseSpider from scrapy.http import Request #import urlparse # items are usually defined in yourproject.items.py # from yourproject.items import BasicPageItem, DetailPageItem # I'm defining them here only to illustrate from scrapy.item import Item class MainPageItem(Item): url = Field() name = Field() description = Field() class DetailPageItem(Item): url = Field() title = Field() long_description = Field() image = Field() class MySpider(BaseSpider): name = 'example.com' allowed_domains = ['example.com'] start_urls = [ 'http://www.example.com/1.html', ] def parse(self, response): hxs = HtmlXPathSelector(response) for i in hxs.select('//selector/for/items').extract(): item = MainPageItem() #item["url"] = item_url #item["name"] = item_page #item["description"] = item_description yield item # each item on Main page has a link # so yield a Request for each one # and tell Scrapy to parse it within another callback #item_url = urlparse.urljoin(response.url, item_url) yield Request(item_url, callback=self.parse_detail_page) def parse_detail_page(self, response): hxs = HtmlXPathSelector(response) item = DetailPageItem() item["url"] = response.url #item["title"] = title #item["long_description"] = long_description #item["image"] = image yield item </code></pre>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload