Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>It has to do with the way Nokogiri's <a href="http://nokogiri.rubyforge.org/nokogiri/Nokogiri.html#M000355" rel="noreferrer">parse method</a> works. Here's the source:</p> <pre><code># File lib/nokogiri.rb, line 55 def parse string, url = nil, encoding = nil, options = nil doc = if string =~ /^\s*&lt;[^Hh&gt;]*html/i # Probably html Nokogiri::HTML::Document.parse(string, url, encoding, options || XML::ParseOptions::DEFAULT_HTML) else Nokogiri::XML::Document.parse(string, url, encoding, options || XML::ParseOptions::DEFAULT_XML) end yield doc if block_given? doc end </code></pre> <p>The key is the line <code>if string =~ /^\s*&lt;[^Hh&gt;]*html/i # Probably html</code>. When you just use <code>open</code>, it returns an object that doesn't work with regex, thus it always returns false. On the other hand, <code>read</code> returns a string, so it <em>could</em> be regarded as HTML. In this case it is, because it matches that regex. Here's the start of that string:</p> <pre><code>&lt;!DOCTYPE html PUBLIC </code></pre> <p>The regex matches the "!DOCTYPE " to <code>[^Hh&gt;]*</code> and then matches the "html", thus assuming it's HTML. Why someone selected this regex to determine if the file is HTML is beyond me. With this regex, a file that begins with a tag like <code>&lt;definitely-not-html&gt;</code> is considered HTML, but <code>&lt;this-is-still-not-html&gt;</code> is considered XML. You're probably best off staying away from this dumb function and invoking <code>Nokogiri::HTML::Document#parse</code> or <code>Nokogiri::XML::Document#parse</code> directly.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload