StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
18837974
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
0
CommunityOwnedDate
CreationDate
2013-09-16T21:56:09.273
FavoriteCount
0
LastActivityDate
2013-09-16T21:56:09.273
LastEditDate
LastEditorUserId
0
OwnerUserId
2572383
ParentId
18823407
PostTypeId
2
Score
0
ViewCount
0
LastEditorDisplayName
text
Body
<p>To elaborate on my earlier comment, here's a sample spider that has 2 callbacks:</p> <ul> <li><code>parse()</code> would contain the logic you already have for the "main" page, yielding items from the main page, and also yielding fetch requests for the detail page for each item</li> <li><code>parse_detail_page()</code> would have a different scraping logic, with other selectors, and yielding another class of Items</li> </ul> <p>:</p> <pre><code>from scrapy.selector import HtmlXPathSelector from scrapy.spider import BaseSpider from scrapy.http import Request #import urlparse # items are usually defined in yourproject.items.py # from yourproject.items import BasicPageItem, DetailPageItem # I'm defining them here only to illustrate from scrapy.item import Item class MainPageItem(Item): url = Field() name = Field() description = Field() class DetailPageItem(Item): url = Field() title = Field() long_description = Field() image = Field() class MySpider(BaseSpider): name = 'example.com' allowed_domains = ['example.com'] start_urls = [ 'http://www.example.com/1.html', ] def parse(self, response): hxs = HtmlXPathSelector(response) for i in hxs.select('//selector/for/items').extract(): item = MainPageItem() #item["url"] = item_url #item["name"] = item_page #item["description"] = item_description yield item # each item on Main page has a link # so yield a Request for each one # and tell Scrapy to parse it within another callback #item_url = urlparse.urljoin(response.url, item_url) yield Request(item_url, callback=self.parse_detail_page) def parse_detail_page(self, response): hxs = HtmlXPathSelector(response) item = DetailPageItem() item["url"] = response.url #item["title"] = title #item["long_description"] = long_description #item["image"] = image yield item </code></pre>
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POScrape follow link with different scraper
  singulars
  PostTypePostTypeId
  PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. This table or related slice is empty.
UserOwnerUserId
1. USpaul trmbrth
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. This table or related slice is empty.
CommentsPostId
1. This table or related slice is empty.

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.