Note that there are some explanatory texts on larger screens.

plurals
  1. POPhantomJS and pjscrape - Failing on some multiple URLs
    primarykey
    data
    text
    <p><strong>Overview</strong></p> <p>I am trying to create a very basic scraper with PhantomJS and pjscrape framework.</p> <p><strong>My Code</strong></p> <pre class="lang-js prettyprint-override"><code>pjs.config({ timeoutInterval: 6000, timeoutLimit: 10000, format: 'csv', csvFields: ['productTitle','price'], writer: 'file', outFile: 'D:\\prod_details.csv' }); pjs.addSuite({ title: 'ChainReactionCycles Scraper', url: productURLs, //This is an array of URLs, two example are defined below scrapers: [ function() { var results []; var linkTitle = _pjs.getText('#ModelsDisplayStyle4_LblTitle'); var linkPrice = _pjs.getText('#ModelsDisplayStyle4_LblMinPrice'); results.push([linkTitle[0],linkPrice[0]]); return results; } ] }); </code></pre> <p><strong>URL Array's Used</strong></p> <p>This first array <em>DOES NOT WORK</em> and fails after the 3rd or 4th URL.</p> <pre class="lang-js prettyprint-override"><code>var productURLs = ["8649","17374","7327","7325","14892","8650","8651","14893","18090","51318"]; for(var i=0;i&lt;productURLs.length;++i){ productURLs[i] = 'http://www.chainreactioncycles.com/Models.aspx?ModelID=' + productURLs[i]; } </code></pre> <p>This second array <em>WORKS</em> and does not fail, even though it is from the same site.</p> <pre class="lang-js prettyprint-override"><code>var categoriesURLs = ["304","2420","965","518","514","1667","521","1302","1138","510"]; for(var i=0;i&lt;categoriesURLs.length;++i){ categoriesURLs[i] = 'http://www.chainreactioncycles.com/Categories.aspx?CategoryID=' + categoriesURLs[i]; } </code></pre> <p><strong>Problem</strong></p> <p>When iterating through <code>productURLs</code> the PhantomJS <code>page.open</code> optional callback automatically assumes <em>failure</em>. Even when the page hasn't finished loading.</p> <p>I know this as I started the script up while running an HTTP debugger and the HTTP request were still running even after PhantomJS had reported a a page load <em>failure</em>.</p> <p>However, the code works fine when running with <code>categoriesURLs</code>.</p> <p><strong>Assumptions</strong></p> <ol> <li>All the URL's listed above are VALID</li> <li>I have the latest versions of both PhantomJS and pjscrape</li> </ol> <p><strong>Possible Solutions</strong></p> <p>These are solutions I have tried thus far.</p> <ol> <li>Disabling image loading <code>page.options.loadImages = false</code></li> <li>Settings a larger <code>timeoutInterval</code> in <code>pjs.config</code> this was not useful apparently as the error generated was of a <code>page.open</code> failure and NOT a timeout failure.</li> </ol> <p>Any ideas?</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload