Note that there are some explanatory texts on larger screens.

plurals
  1. POWhen using HtmlUnit, how can I configure the underlying NekoHtml parser?
    text
    copied!<p>I'm using HtmlUnit to try and scrape a webpage because of it's Javascript support. (I'd rather use Jsoup, but no JS support).</p> <p>The issue relates to a feature of the underlying NekoHtml parser: "<a href="http://cyberneko.org/html/features/scanner/allow-selfclosing-iframe" rel="nofollow">http://cyberneko.org/html/features/scanner/allow-selfclosing-iframe</a>"</p> <p>See: <a href="http://nekohtml.sourceforge.net/settings.html" rel="nofollow">http://nekohtml.sourceforge.net/settings.html</a></p> <p>This can apparently be enabled in Neko, but I'm using HtmlUnit. Is there a way to configure the underlying Neko parser that HTML unit is using to enable this feature?</p> <p>When attempting to run this code:</p> <pre><code>final WebClient webClient = new WebClient(); HtmlPage page = webClient.getPage(url.toString()); </code></pre> <p>I'm getting this error:</p> <pre><code>Caused by: com.gargoylesoftware.htmlunit.ObjectInstantiationException: unable to create HTML parser at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.&lt;init&gt;(HTMLParser.java:418) at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.&lt;init&gt;(HTMLParser.java:342) at com.gargoylesoftware.htmlunit.html.HTMLParser.parse(HTMLParser.java:203) at com.gargoylesoftware.htmlunit.html.HTMLParser.parseHtml(HTMLParser.java:179) at com.gargoylesoftware.htmlunit.DefaultPageCreator.createHtmlPage(DefaultPageCreator.java:221) at com.gargoylesoftware.htmlunit.DefaultPageCreator.createPage(DefaultPageCreator.java:106) at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:433) at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:311) at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:373) at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:358) at Caused by: org.xml.sax.SAXNotRecognizedException: Feature 'http://cyberneko.org/html/features/scanner/allow-selfclosing-iframe' is not recognized. at org.apache.xerces.parsers.AbstractSAXParser.setFeature(Unknown Source) at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.&lt;init&gt;(HTMLParser.java:411) ... 41 more </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload