Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>Building on <a href="http://groups.google.com/group/scrapy-users/browse_thread/thread/9ac290ed469887f1" rel="noreferrer">the solution from Pablo Hoffman</a>, you can use the following decorator on the <code>process_item</code> method of a Pipeline object so that it checks the <code>pipeline</code> attribute of your spider for whether or not it should be executed. For example:</p> <pre><code>def check_spider_pipeline(process_item_method): @functools.wraps(process_item_method) def wrapper(self, item, spider): # message template for debugging msg = '%%s %s pipeline step' % (self.__class__.__name__,) # if class is in the spider's pipeline, then use the # process_item method normally. if self.__class__ in spider.pipeline: spider.log(msg % 'executing', level=log.DEBUG) return process_item_method(self, item, spider) # otherwise, just return the untouched item (skip this step in # the pipeline) else: spider.log(msg % 'skipping', level=log.DEBUG) return item return wrapper </code></pre> <p>For this decorator to work correctly, the spider must have a pipeline attribute with a container of the Pipeline objects that you want to use to process the item, for example:</p> <pre><code>class MySpider(BaseSpider): pipeline = set([ pipelines.Save, pipelines.Validate, ]) def parse(self, response): # insert scrapy goodness here return item </code></pre> <p>And then in a <code>pipelines.py</code> file:</p> <pre><code>class Save(object): @check_spider_pipeline def process_item(self, item, spider): # do saving here return item class Validate(object): @check_spider_pipeline def process_item(self, item, spider): # do validating here return item </code></pre> <p>All Pipeline objects should still be defined in ITEM_PIPELINES in settings (in the correct order -- would be nice to change so that the order could be specified on the Spider, too).</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload