Note that there are some explanatory texts on larger screens.

plurals
  1. POSending email with attachments after crawling a website
    primarykey
    data
    text
    <p>I'm using Scrapy for a school project to find dead links and missing pages. I've written pipelines for writing text files with the relevant scraped information. I'm having trouble with figuring out how to send an email at the end of the spiders run with the files that were made as attachments.</p> <p>Scrapy has built in email functionality and fires signals when the spider finishes, but getting everything together in a way that is sensible is eluding me. Any help would be greatly appreciated.</p> <p>Here is my pipeline for creating the files with scraped data:</p> <pre><code>class saveToFile(object): def __init__(self): # open files self.old = open('old_pages.txt', 'wb') self.date = open('pages_without_dates.txt', 'wb') self.missing = open('missing_pages.txt', 'wb') # write table headers line = "{0:15} {1:40} {2:} \n\n".format("Domain","Last Updated","URL") self.old.write(line) line = "{0:15} {1:} \n\n".format("Domain","URL") self.date.write(line) line = "{0:15} {1:70} {2:} \n\n".format("Domain","Page Containing Broken Link","URL of Broken Link") self.missing.write(line) def process_item(self, item, spider): # add items to file as they are scraped if item['group'] == "Old Page": line = "{0:15} {1:40} {2:} \n".format(item['domain'],item["lastUpdated"],item["url"]) self.old.write(line) elif item['group'] == "No Date On Page": line = "{0:15} {1:} \n".format(item['domain'],item["url"]) self.date.write(line) elif item['group'] == "Page Not Found": line = "{0:15} {1:70} {2:} \n".format(item['domain'],item["referrer"],item["url"]) self.missing.write(line) return item </code></pre> <p>I would like to create a separate pipeline item for sending the email. What I have so far is the following:</p> <pre><code>class emailResults(object): def __init__(self): dispatcher.connect(self.spider_closed, spider_closed) dispatcher.connect(self.spider_opened, spider_opened) old = open('old_pages.txt', 'wb') date = open('pages_without_dates.txt', 'wb') missing = open('missing_pages.txt', 'wb') oldOutput = open('twenty_oldest_pages.txt', 'wb') attachments = [ ("old_pages", "text/plain", old) ("date", "text/plain", date) ("missing", "text/plain", missing) ("oldOutput", "text/plain", oldOutput) ] self.mailer = MailSender() def spider_closed(SPIDER_NAME): self.mailer.send(to=["example@gmail.com"], attachs=attachments, subject="test email", body="Some body") </code></pre> <p>It seems that in previous versions of Scrapy you could pass self into the spider_closed function, but in the current version (0.21) the spider_closed function is only passed the spider name.</p> <p>Any help and/or advice would be greatly appreciated.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload