Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>I have done this a lot. You'll want to use <a href="http://phantomjs.org/" rel="noreferrer">PhantomJS</a> if the website that you're scraping is heavily using JavaScript. Note that PhantomJS is not Node.js. It's a completely different JavaScript runtime. You can integrate through <a href="https://github.com/sgentle/phantomjs-node" rel="noreferrer">phantomjs-node</a> or <a href="https://github.com/alexscheelmeyer/node-phantom" rel="noreferrer">node-phantom</a>, but they are both kinda hacky. YMMV with those. Avoid anything to do with jsdom. It'll cause you headaches - this includes <a href="http://zombie.labnotes.org/" rel="noreferrer">Zombie.js</a>.</p> <p>What you should use is <a href="https://github.com/MatthewMueller/cheerio" rel="noreferrer">Cheerio</a> in conjunction with <a href="https://github.com/mikeal/request" rel="noreferrer">Request</a>. This will be sufficient for most web pages. </p> <p>I wrote a blog post on using Cheerio with Request: <a href="http://procbits.com/2012/04/11/quick-and-dirty-screen-scraping-with-node-js-using-request-and-cheerio/" rel="noreferrer">Quick and Dirty Screen Scraping with Node.js</a> But, again, if it's JavaScript intensive, use PhantomJS in conjunction with <a href="http://casperjs.org/" rel="noreferrer">CasperJS</a>. </p> <p>Hope this helps.</p> <p>Snippet using Request and Cheerio:</p> <pre><code>var request = require('request') , cheerio = require('cheerio'); var searchTerm = 'screen+scraping'; var url = 'http://www.bing.com/search?q=' + searchTerm; request(url, function(err, resp, body){ $ = cheerio.load(body); links = $('.sb_tlst h3 a'); //use your CSS selector here $(links).each(function(i, link){ console.log($(link).text() + ':\n ' + $(link).attr('href')); }); }); </code></pre>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload