Note that there are some explanatory texts on larger screens.

plurals
  1. POOpen iso-8859-1 encoded html with nokogiri messes up accents
    primarykey
    data
    text
    <p>I'm trying to make some changes to an html page encoded with charset=iso-8859-1</p> <p>doc = Nokogiri::HTML(open(html_file))</p> <p>puts doc.to_html messes up all the accents in the page. So if I save it back it looks broken in the browser as well.</p> <p>I'm still on Rails 3.0.6... Any hints how to fix this problem?</p> <p>Here's one of the pages suffering from that for example: <a href="http://www.elmundo.es/accesible/elmundo/2012/03/07/solidaridad/1331108705.html" rel="nofollow">http://www.elmundo.es/accesible/elmundo/2012/03/07/solidaridad/1331108705.html</a></p> <p>I've asked also in Github but I have the feeling this will be faster. I'll update both places if I get a cure for the problem.</p> <p><strong>UPDATE 1</strong> 24 March 2012</p> <p>Thanks for the comments. I managed to partially solve this issue. I believe this has nothing to do with Nokogiri however. As I mentioned in some comment I just need to open and save the file to get the accents messed up.</p> <p>The closest to a fix I got is doing this:</p> <pre><code>thefile = File.open(html_file, "r") text = thefile.read doc = Nokogiri::HTML(text) ... do any stuff with nokogiri File.open(html_file, 'w') {|f| f.write(doc.to_html) } </code></pre> <p>The original file came with iso-8859-1, the save one goes in utf-8 pretty much it looks ok. Accents are in place. Except for the access in the capital letter :-P I get question marks like in Econom�a , there should be í (i with an accent)</p> <p>Getting closer I think. If someone has a hint to cover the capital letters as well it might be almost done.</p> <p>Cheers.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload