StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PORunning scrapy from inside Python script - CSV exporter doesn't work
primarykey
Id
17761214
data
AcceptedAnswerId
29414004
AnswerCount
2
ClosedDate
CommentCount
12
CommunityOwnedDate
CreationDate
2013-07-20T10:23:39.547
FavoriteCount
2
LastActivityDate
2017-10-19T17:45:03.680
LastEditDate
2017-10-19T17:45:03.680
LastEditorUserId
3885376
OwnerUserId
1349316
ParentId
0
PostTypeId
1
Score
6
ViewCount
1791
LastEditorDisplayName
text
Body
My scraper works fine when I run it from the command line, but when I try to run it from within a python script (with the method outlined <a href="https://stackoverflow.com/questions/14777910/scrapy-crawl-from-script-always-blocks-script-execution-after-scraping/14802526#14802526">here</a> using Twisted) it does not output the two CSV files that it normally does. I have a pipeline that creates and populates these files, one of them using CsvItemExporter() and the other using writeCsvFile(). Here is the code: <pre><code>class CsvExportPipeline(object): def __init__(self): self.files = {} @classmethod def from_crawler(cls, crawler): pipeline = cls() crawler.signals.connect(pipeline.spider_opened, signals.spider_opened) crawler.signals.connect(pipeline.spider_closed, signals.spider_closed) return pipeline def spider_opened(self, spider): nodes = open('%s_nodes.csv' % spider.name, 'w+b') self.files[spider] = nodes self.exporter1 = CsvItemExporter(nodes, fields_to_export=['url','name','screenshot']) self.exporter1.start_exporting() self.edges = [] self.edges.append(['Source','Target','Type','ID','Label','Weight']) self.num = 1 def spider_closed(self, spider): self.exporter1.finish_exporting() file = self.files.pop(spider) file.close() writeCsvFile(getcwd()+r'\edges.csv', self.edges) def process_item(self, item, spider): self.exporter1.export_item(item) for url in item['links']: self.edges.append([item['url'],url,'Directed',self.num,'',1]) self.num += 1 return item </code></pre> Here is my file structure: <pre><code>SiteCrawler/ # the CSVs are normally created in this folder runspider.py # this is the script that runs the scraper scrapy.cfg SiteCrawler/ __init__.py items.py pipelines.py screenshooter.py settings.py spiders/ __init__.py myfuncs.py sitecrawler_spider.py </code></pre> The scraper appears to function normally in all other ways. The output at the end in the command line suggests that the expected number of pages were crawled and the spider appears to have finished normally. I am not getting any error messages. ---- EDIT : ---- Inserting print statements and syntax errors into the pipeline has no effect, so it appears that the pipeline is being ignored. Why might this be? Here is the code for the script that runs the scraper (runspider.py): <pre><code>from twisted.internet import reactor from scrapy import log, signals from scrapy.crawler import Crawler from scrapy.settings import Settings from scrapy.xlib.pydispatch import dispatcher import logging from SiteCrawler.spiders.sitecrawler_spider import MySpider def stop_reactor(): reactor.stop() dispatcher.connect(stop_reactor, signal=signals.spider_closed) spider = MySpider() crawler = Crawler(Settings()) crawler.configure() crawler.crawl(spider) crawler.start() log.start(loglevel=logging.DEBUG) log.msg('Running reactor...') reactor.run() # the script will block here until the spider is closed log.msg('Reactor stopped.') </code></pre>
Tags
<python><python-2.7><export><twisted><scrapy>
Title
Running scrapy from inside Python script - CSV exporter doesn't work
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. This table or related slice is empty.
UserOwnerUserId
1. USjkdune
plurals
PostLinksPostIdRelatedPostId
1. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PORunning scrapy from inside Python script - CSV exporter doesn't work
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 PORunning scrapy from inside Python script - CSV exporter doesn't work
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 PORunning scrapy from inside Python script - CSV exporter doesn't work
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.