Note that there are some explanatory texts on larger screens.

plurals
  1. POscrape through website with href references
    primarykey
    data
    text
    <p>I am using scrapy, and I want to scrape through www.rentler.com. I have gone to the website and searched for the city that I am interested in, and here is the link of that search result:</p> <pre><code>https://www.rentler.com/search?Location=millcreek&amp;MaxPrice= </code></pre> <p>Now, all of the listings that I am interested in are contained on that page, and I want to recursively step through them, one by one.</p> <p>Each listing is listed under: </p> <pre><code>&lt;body&gt;/&lt;div id="wrap"&gt;/&lt;div class="container search-res"&gt;/&lt;ul class="search-results"&gt;&lt;li class="result"&gt; </code></pre> <p>each result has a <code>&lt;a class="search-result-link" href="/listing/288910"&gt;</code></p> <p>I know that I need to create a rule for the crawlspider and have it look at that href and append it to the url. That way it could go to each page, and grab that data that I am interested in.</p> <p>I think I need something like this:</p> <pre><code>rules = (Rule(SgmlLinkExtractor(allow="not sure what to insert here, but this is where I think I need to href appending", callback='parse_item', follow=true),) </code></pre> <p><strong>UPDATE</strong> *<em>Thank you for the input. Here is what I now have, it seems to run but does not scrape:</em>*</p> <pre><code>import re from scrapy.contrib.spiders import CrawlSpider, Rule from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor from scrapy.selector import HtmlXPathSelector from KSL.items import KSLitem class KSL(CrawlSpider): name = "ksl" allowed_domains = ["https://www.rentler.com"] start_urls = ["https://www.rentler.com/ksl/listing/index/?sid=17403849&amp;nid=651&amp;ad=452978"] regex_pattern = '&lt;a href="listing/(.*?) class="search-result-link"&gt;' def parse_item(self, response): items = [] hxs = HtmlXPathSelector(response) sites = re.findall(regex_pattern, "https://www.rentler.com/search?location=millcreek&amp;MaxPrice=") for site in sites: item = KSLitem() item['price'] = site.select('//div[@class="price"]/text()').extract() item['address'] = site.select('//div[@class="address"]/text()').extract() item['stats'] = site.select('//ul[@class="basic-stats"]/li/div[@class="count"]/text()').extract() item['description'] = site.select('//div[@class="description"]/div/p/text()').extract() items.append(item) return items </code></pre> <p>Thoughts?</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload