Note that there are some explanatory texts on larger screens.

plurals
  1. POMaking a while loop asynchronous in nodeJS
    primarykey
    data
    text
    <p>There are several similar questions here on Stack but I can't get any answers working for me, I'm completely new to Node and the idea of asynchronous programming so please bear with me.</p> <p>I'm building a scraper that currently has a 4 step process:</p> <ol> <li>I give it a collection of links</li> <li>It goes to each of these links, finds all relevant <code>img src</code> on the page </li> <li>It finds the "next page" link, gets its <code>href</code>, retrieves the dom from said <code>href</code> and repeats step #2.</li> <li>All of these <code>img src</code> are put into an array and returned</li> </ol> <p>Here's the code. <code>getLinks</code> can be called asynchronously but the <code>while</code> loop in it currently cannot:</p> <pre><code>function scrape(url, oncomplete) { console.log("Scrape Function: " + url); request(url, function(err, resp, body) { if (err) { console.log(UHOH); throw err; } var html = cheerio.load(body); oncomplete(html); } ); } function getLinks(url, prodURL, baseURL, next_select) { var urls = []; while(url) { console.log("GetLinks Indexing: " + url); var html = scrape(url, function(data) { $ = data; $(prodURL).each(function() { var theHref = $(this).attr('href'); urls.push(baseURL + theHref); } ); next = $(next_select).first().attr('href'); url = next ? baseurl + next : null; } ); } console.log(urls); return urls; } </code></pre> <p>At present this goes into an infinite loop without scraping anything. If I put the <code>url = next ? baseurl + next : null;</code> outside of the callback I get a <code>"next" is not defined</code> error. </p> <p>Any ideas on how I can re-work this to make it node-friendly? It seems like, by this problem's very nature, it needs to be blocking, no?</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload