StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POWeb data scraping (online news comments) with Scrapy (Python)
text
Body
copied!<p>I want to scrape web comments data from online news purely for research. And I noticed that I have to learn about Scrapy... </p> <p>Usually, I do programming with Python. I though it will be easy to learn. But I got some problems.</p> <p>I want to scrape news comment in <a href="http://news.yahoo.com/congress-wary--but-unlikely-to-blow-up-obama-s-iran-deal-230545228.html" rel="nofollow">http://news.yahoo.com/congress-wary--but-unlikely-to-blow-up-obama-s-iran-deal-230545228.html</a>. </p> <p>But the problem is there is a button (>View Comments (452)) to see the comments. In addition, what I want to do is scraping all the comments in that news. Unfortunately, I have to click another button (View more comments) to see other 10 comments more. </p> <p>How can I handle this problem?</p> <p>The code that I've done is as below. Sorry for too poor code.</p> <pre><code>############################################# from scrapy.spider import BaseSpider from scrapy.selector import Selector from tutorial.items import DmozItem class DmozSpider(BaseSpider): name = "dmoz" allowed_domains = ["news.yahoo.com"] start_urls = ["http://news.yahoo.com/blogs/oddnews/driver-offended-by-%E2%80%9Cwh0-r8x%E2`%80%9D-license-plate-221720503.html",] def parse(self, response): sel = Selector(response) sites = sel.xpath('//div/p') items = [] for site in sites: item = DmozItem() item['title'] = site.xpath('/text()').extract() items.append(item) return items </code></pre> <p>You can see that how much left to be done to solve my problem. But I have to be hurry.. I will do my best anyway. </p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload