Note that there are some explanatory texts on larger screens.

plurals
  1. POScrapping data out of facebook using scrapy
    primarykey
    data
    text
    <p>The new graph search on facebook lets you search for current employees of a company using query token - <strong>Current Google employees</strong> (for example). </p> <p>I want to scrap the results page (<a href="http://www.facebook.com/search/104958162837/employees/present" rel="nofollow">http://www.facebook.com/search/104958162837/employees/present</a>) via scrapy.</p> <p>Initial problem was facebook allows only a facebook user to access the information, so directing me to login.php. So, before scraping this url, I logged in via scrapy and then this result page. But even though the http response is 200 for this page, it does not scraps any data. The code is as follows : </p> <pre><code>import sys from scrapy.spider import BaseSpider from scrapy.http import FormRequest from scrapy.selector import HtmlXPathSelector from scrapy.contrib.spiders import CrawlSpider, Rule from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor from scrapy.selector import HtmlXPathSelector from scrapy.item import Item from scrapy.http import Request class DmozSpider(BaseSpider): name = "test" start_urls = ['https://www.facebook.com/login.php']; task_urls = [query] def parse(self, response): return [FormRequest.from_response(response, formname='login_form',formdata={'email':'myemailid','pass':'myfbpassword'}, callback=self.after_login)] def after_login(self,response): if "authentication failed" in response.body: self.log("Login failed",level=log.ERROR) return return Request(query, callback=self.page_parse) def page_parse(self,response): hxs = HtmlXPathSelector(response) print hxs items = hxs.select('//div[@class="_4_yl"]') count = 0 print items </code></pre> <p>What could I have missed or done incorrectly ?</p> <p>Thanks in advance.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload