Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <blockquote> <p>At first I thought scrapyd was made for this, but after reading the doc, it seems that it's more a daemon able to manage 'packaged spiders', aka 'scrapy eggs'; and that all the settings (start_urls , allowed_domains, settings ) must still be hardcoded in the 'scrapy egg' itself ; so it doesn't look like a solution to my question, unless I missed something. </p> </blockquote> <p>I don't agree to the above statement, start_urls need not be hard-coded they can be dynamically passed to the class, you should be able to pass it as an argument like this</p> <pre><code>http://localhost:6800/schedule.json -d project=myproject -d spider=somespider -d setting=DOWNLOAD_DELAY=2 -d arg1=val1 </code></pre> <p>Or you should be able to retrieve the URLs from a database or a file. I get it from a database like this</p> <pre><code>class WikipediaSpider(BaseSpider): name = 'wikipedia' allowed_domains = ['wikipedia.com'] start_urls = [] def __init__(self, name=None, url=None, **kwargs): item = MovieItem() item['spider'] = self.name # You can pass a specific url to retrieve if url: if name is not None: self.name = name elif not getattr(self, 'name', None): raise ValueError("%s must have a name" % type(self).__name__) self.__dict__.update(kwargs) self.start_urls = [url] else: # If there is no specific URL get it from Database wikiliks = # &lt; -- CODE TO RETRIEVE THE LINKS FROM DB --&gt; if wikiliks == None: print "**************************************" print "No Links to Query" print "**************************************" return None for link in wikiliks: # SOME PROCESSING ON THE LINK GOES HERE self.start_urls.append(urllib.unquote_plus(link[0])) def parse(self, response): hxs = HtmlXPathSelector(response) # Remaining parse code goes here </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload