StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POhow to overwrite / use cookies in scrapy
primarykey
Id
10667202
data
AcceptedAnswerId
0
AnswerCount
3
ClosedDate
CommentCount
0
CommunityOwnedDate
CreationDate
2012-05-19T17:01:29.893
FavoriteCount
5
LastActivityDate
2016-07-05T13:30:05.160
LastEditDate
2012-05-30T19:32:24.680
LastEditorUserId
183982
OwnerUserId
183982
ParentId
0
PostTypeId
1
Score
12
ViewCount
12084
LastEditorDisplayName
text
Body
I want to scrap <a href="http://www.3andena.com/" rel="noreferrer">http://www.3andena.com/</a>, this web site starts first in Arabic, and it stores the language settings in cookies. If you tried to access the language version directly through URL (<a href="http://www.3andena.com/home.php?sl=en" rel="noreferrer">http://www.3andena.com/home.php?sl=en</a>), it makes a problem and return server error. So, I want to set the cookie value "store_language" to "en", then start scrap the website using this cookie values. I'm using CrawlSpider with a couple of Rules. here's the code <pre><code>from scrapy.spider import BaseSpider from scrapy.selector import HtmlXPathSelector from scrapy.contrib.spiders import CrawlSpider, Rule from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor from scrapy import log from bkam.items import Product from scrapy.http import Request import re class AndenaSpider(CrawlSpider): name = "andena" domain_name = "3andena.com" start_urls = ["http://www.3andena.com/Kettles/?objects_per_page=10"] product_urls = [] rules = ( # The following rule is for pagination Rule(SgmlLinkExtractor(allow=(r'\?page=\d+$'),), follow=True), # The following rule is for produt details Rule(SgmlLinkExtractor(restrict_xpaths=('//div[contains(@class, "products-dialog")]//table//tr[contains(@class, "product-name-row")]/td'), unique=True), callback='parse_product', follow=True), ) def start_requests(self): yield Request('http://3andena.com/home.php?sl=en', cookies={'store_language':'en'}) for url in self.start_urls: yield Request(url, callback=self.parse_category) def parse_category(self, response): hxs = HtmlXPathSelector(response) self.product_urls.extend(hxs.select('//td[contains(@class, "product-cell")]/a/@href').extract()) for product in self.product_urls: yield Request(product, callback=self.parse_product) def parse_product(self, response): hxs = HtmlXPathSelector(response) items = [] item = Product() ''' some parsing ''' items.append(item) return items SPIDER = AndenaSpider() </code></pre> Here's the log : <pre><code>2012-05-30 19:27:13+0000 [andena] DEBUG: Redirecting (301) to <GET http://www.3andena.com/home.php?sl=en&xid_479d9=97656c0c5837f87b8c479be7c6621098> from <GET http://3andena.com/home.php?sl=en> 2012-05-30 19:27:14+0000 [andena] DEBUG: Redirecting (302) to <GET http://www.3andena.com/home.php?sl=en&xid_479d9=97656c0c5837f87b8c479be7c6621098> from <GET http://www.3andena.com/home.php?sl=en&xid_479d9=97656c0c5837f87b8c479be7c6621098> 2012-05-30 19:27:14+0000 [andena] DEBUG: Crawled (200) <GET http://www.3andena.com/Kettles/?objects_per_page=10> (referer: None) 2012-05-30 19:27:15+0000 [andena] DEBUG: Crawled (200) <GET http://www.3andena.com/B-and-D-Concealed-coil-pan-kettle-JC-62.html> (referer: http://www.3andena.com/Kettles/?objects_per_page=10) </code></pre>
Tags
<python><scrapy>
Title
how to overwrite / use cookies in scrapy
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USMahmoud M. Abdel-Fattah
UserOwnerUserId
1. USMahmoud M. Abdel-Fattah
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POhow to overwrite / use cookies in scrapy
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 POhow to overwrite / use cookies in scrapy
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId
1. This table or related slice is empty.

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.