Note that there are some explanatory texts on larger screens.

plurals
  1. POScrapy Crawl URLs in Order
    primarykey
    data
    text
    <p>So, my problem is relatively simple. I have one spider crawling multiple sites, and I need it to return the data in the order I write it in my code. It's posted below.</p> <pre><code>from scrapy.spider import BaseSpider from scrapy.selector import HtmlXPathSelector from mlbodds.items import MlboddsItem class MLBoddsSpider(BaseSpider): name = "sbrforum.com" allowed_domains = ["sbrforum.com"] start_urls = [ "http://www.sbrforum.com/mlb-baseball/odds-scores/20110328/", "http://www.sbrforum.com/mlb-baseball/odds-scores/20110329/", "http://www.sbrforum.com/mlb-baseball/odds-scores/20110330/" ] def parse(self, response): hxs = HtmlXPathSelector(response) sites = hxs.select('//div[@id="col_3"]//div[@id="module3_1"]//div[@id="moduleData4952"]') items = [] for site in sites: item = MlboddsItem() item['header'] = site.select('//div[@class="scoreboard-bar"]//h2//span[position()&gt;1]//text()').extract()# | /*//table[position()&lt;2]//tr//th[@colspan="2"]//text()').extract() item['game1'] = site.select('/*//table[position()=1]//tr//td[@class="tbl-odds-c2"]//text() | /*//table[position()=1]//tr//td[@class="tbl-odds-c4"]//text() | /*//table[position()=1]//tr//td[@class="tbl-odds-c6"]//text()').extract() items.append(item) return items </code></pre> <p>The results are returned in a random order, for example it returns the 29th, then the 28th, then the 30th. I've tried changing the scheduler order from DFO to BFO, just in case that was the problem, but that didn't change anything.</p> <p>Thanks in advance.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload