Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Use <a href="http://scrapy.org/" rel="noreferrer">Scrapy</a>.</p> <p>It is a twisted-based web crawler framework. Still under heavy development but it works already. Has many goodies:</p> <ul> <li>Built-in support for parsing HTML, XML, CSV, and Javascript</li> <li>A media pipeline for scraping items with images (or any other media) and download the image files as well</li> <li>Support for extending Scrapy by plugging your own functionality using middlewares, extensions, and pipelines</li> <li>Wide range of built-in middlewares and extensions for handling of compression, cache, cookies, authentication, user-agent spoofing, robots.txt handling, statistics, crawl depth restriction, etc</li> <li>Interactive scraping shell console, very useful for developing and debugging</li> <li>Web management console for monitoring and controlling your bot</li> <li>Telnet console for low-level access to the Scrapy process</li> </ul> <p>Example code to extract information about all torrent files added today in the <a href="http://www.mininova.org/" rel="noreferrer">mininova</a> torrent site, by using a XPath selector on the HTML returned:</p> <pre><code>class Torrent(ScrapedItem): pass class MininovaSpider(CrawlSpider): domain_name = 'mininova.org' start_urls = ['http://www.mininova.org/today'] rules = [Rule(RegexLinkExtractor(allow=['/tor/\d+']), 'parse_torrent')] def parse_torrent(self, response): x = HtmlXPathSelector(response) torrent = Torrent() torrent.url = response.url torrent.name = x.x("//h1/text()").extract() torrent.description = x.x("//div[@id='description']").extract() torrent.size = x.x("//div[@id='info-left']/p[2]/text()[2]").extract() return [torrent] </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload