Note that there are some explanatory texts on larger screens.

plurals
  1. POHow to scrape data from list of URLs and save data to CSV with nokogiri
    text
    copied!<p>I have a file called bontyurls.csv that looks like this:</p> <pre><code>http://bontrager.com/model/11383 http://bontrager.com/model/01740 http://bontrager.com/model/09595 </code></pre> <p>I want my script to read that file and then spit out a file like this: bonty_test_urls_results.csv</p> <pre><code>url,model_names http://bontrager.com/model/11383,"Road TLR Conversion Kit" http://bontrager.com/model/01740,"404 File Not Found" http://bontrager.com/model/09595,"RXL Road" </code></pre> <p>Here's what I've got so far:</p> <pre><code># based on code from here: http://www.andrewsturges.com/2011/09/how-to-harvest-web-data-using-ruby-and.html require 'nokogiri' require 'open-uri' require 'csv' @urls = Array.new @model_names = Array.new urls = CSV.read("bontyurls.csv") (0..urls.length - 1).each do |index| puts urls[index][0] doc = Nokogiri::HTML(open(urls[index][0])) doc.xpath('//h1').each do |model_name| @model_name &lt;&lt; model_name.content end end # write results to file CSV.open("bonty_test_urls_results.csv", "wb") do |row| row &lt;&lt; ["url", "model_names"] (0..@urls.length - 1).each do |index| row &lt;&lt; [ @urls[index], @model_names[index]] end end </code></pre> <p>That code isn't working. I'm getting this error:</p> <pre><code>$ ruby bonty_test_urls.rb http://bontrager.com/model/00310 bonty_test_urls.rb:15:in `block (2 levels) in &lt;main&gt;': undefined method `&lt;&lt;' for nil:NilClass (NoMethodError) from /home/simon/.rvm/gems/ruby-1.9.3-p194/gems/nokogiri-1.5.5/lib/nokogiri/xml/node_set.rb:239:in `block in each' from /home/simon/.rvm/gems/ruby-1.9.3-p194/gems/nokogiri-1.5.5/lib/nokogiri/xml/node_set.rb:238:in `upto' from /home/simon/.rvm/gems/ruby-1.9.3-p194/gems/nokogiri-1.5.5/lib/nokogiri/xml/node_set.rb:238:in `each' from bonty_test_urls.rb:14:in `block in &lt;main&gt;' from bonty_test_urls.rb:11:in `each' from bonty_test_urls.rb:11:in `&lt;main&gt;' </code></pre> <p>Here is some code that returns the model_name at least. I'm just having trouble getting it to work in the larger script:</p> <pre><code>require 'open-uri' require 'nokogiri' doc = Nokogiri::HTML(open("http://bontrager.com/model/09124")) doc.xpath('//h1').each do |node| puts node.text end </code></pre> <p>Also, I haven't figured out how to handle the URLs that return a 404.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload