Note that there are some explanatory texts on larger screens.

plurals
  1. PONokogiri HTML parse undefined method 'namespace_definitions' blows up on <o:p> tag
    text
    copied!<p>I have a rails app that is parsing HTML using the nokogiri gem version 1.4.0</p> <p>To parse and cleanup the html fragment, I'm using this: </p> <pre><code>Nokogiri::HTML::DocumentFragment.parse(text).to_html </code></pre> <p>I'm getting this error when I try to parse certain inputs, which worked when using hpricot to parse:</p> <pre><code>NoMethodError: undefined method `namespace_definitions' for nil:NilClass from .../nokogiri-1.4.0/lib/nokogiri/xml/fragment_handler.rb:33:in `start_element' from .../nokogiri-1.4.0/lib/nokogiri/html/sax/parser.rb:34:in `parse_with' from .../nokogiri-1.4.0/lib/nokogiri/html/sax/parser.rb:34:in `parse_memory' from .../nokogiri-1.4.0/lib/nokogiri/xml/sax/parser.rb:83:in `parse' from .../nokogiri-1.4.0/lib/nokogiri/xml/document_fragment.rb:7:in `initialize' from .../nokogiri-1.4.0/lib/nokogiri/html/document_fragment.rb:9:in `new' from .../nokogiri-1.4.0/lib/nokogiri/html/document_fragment.rb:9:in `parse' </code></pre> <p>I've tracked it down to the tag, which from what I get is something the MS Office uses to tag paragraph breaks.</p> <pre><code>&lt;p class="MsoNormal"&gt;&lt;span style="font-family:&amp;quot;Arial&amp;quot;,&amp;quot;sans-serif&amp;quot;"&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt; </code></pre> <p>Is there a way to get Nokogiri to not blow up on this tag? Ideally I would like that it just leaves the tag unchanged like hpricot would have, if that's possible. If not then at least stripping the tags would be better than throwing an error.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload