Note that there are some explanatory texts on larger screens.

plurals
  1. POscrapy not able to scrap https sites
    primarykey
    data
    text
    <p>I am new to scrapy, so this question may seem too simple. I had no problem with downloading from http sites . But when I tried to do the same for the <a href="https://www.bundesanzeiger.de/ebanzwww/wexsservlet?page.navid=to_nlp_start" rel="nofollow">this url</a>, I am getting the following error. </p> <p>code:</p> <pre><code>from scrapy.spider import BaseSpider class Bundspider(BaseSpider): name="bund" allowed_domains=["www.bundesanzeiger.de"] start_urls=[ "https://www.bundesanzeiger.de/ebanzwww/wexsservlet?page.navid=to_nlp_start" ] def parse(self, response): filename = response.url.split("/")[-2] open(filename, 'wb').write(response.body) </code></pre> <p>Error:</p> <pre><code>2013-03-20 01:20:54-0400 [scrapy] INFO: Scrapy 0.16.4 started (bot: tutorial) 2013-03-20 01:20:54-0400 [scrapy] DEBUG: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState 2013-03-20 01:20:54-0400 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, RedirectMiddleware, CookiesMiddleware, HttpProxyMiddleware, HttpCompressionMiddleware, ChunkedTransferMiddleware, DownloaderStats 2013-03-20 01:20:54-0400 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware 2013-03-20 01:20:54-0400 [scrapy] DEBUG: Enabled item pipelines: 2013-03-20 01:20:54-0400 [bund] INFO: Spider opened 2013-03-20 01:20:54-0400 [bund] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2013-03-20 01:20:54-0400 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023 2013-03-20 01:20:54-0400 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080 2013-03-20 01:20:54-0400 [bund] ERROR: Error downloading &lt;GET https://www.bundesanzeiger.de/ebanzwww/wexsservlet?page.navid=to_nlp_start&gt;: [('SSL routines', 'SSL23_GET_SERVER_HELLO', 'unknown protocol')] 2013-03-20 01:20:54-0400 [bund] INFO: Closing spider (finished) 2013-03-20 01:20:54-0400 [bund] INFO: Dumping Scrapy stats: {'downloader/exception_count': 1, 'downloader/exception_type_count/OpenSSL.SSL.Error': 1, 'downloader/request_bytes': 271, 'downloader/request_count': 1, 'downloader/request_method_count/GET': 1, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2013, 3, 20, 5, 20, 54, 814159), 'log_count/DEBUG': 6, 'log_count/ERROR': 1, 'log_count/INFO': 4, 'scheduler/dequeued': 1, 'scheduler/dequeued/memory': 1, 'scheduler/enqueued': 1, 'scheduler/enqueued/memory': 1, 'start_time': datetime.datetime(2013, 3, 20, 5, 20, 54, 796438)} 2013-03-20 01:20:54-0400 [bund] INFO: Spider closed (finished) </code></pre>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload