Note that there are some explanatory texts on larger screens.

plurals
  1. POWhat would cause HtmlUnit to throw 40+ Exceptions and fail when I try to load a page with javascript?
    text
    copied!<p>I'm attempting to scrape the practice quizzes from my History Book's website so that I can assemble them all into one big test to study. </p> <p>The page is <a href="http://www.wwnorton.com/college/polisci/we-the-people8/shorter/ch/15/quiz.aspx" rel="nofollow">here</a>. </p> <p>It is all driven by <code>javascript</code>, so I'm attempting to use <code>HtmlUnit</code> to scrape the page. </p> <p>I'm more of a Python guy, so I set up my initial code to pretty closely mirror HtmlUnit's Getting Started section: </p> <pre><code>import com.gargoylesoftware.htmlunit.*; import com.gargoylesoftware.htmlunit.WebClient; import com.gargoylesoftware.htmlunit.html.HtmlPage; public class HelloWorld { public static void main(String[] args) throws Exception { homePage(); System.out.println("Done."); } public static void homePage() throws Exception { final WebClient webClient = new WebClient(); String url = "http://www.wwnorton.com/college/polisci/we-the-people8/shorter/ch/15/quiz.aspx"; final HtmlPage page = webClient.getPage(url); System.out.println(page.asText()); webClient.closeAllWindows(); } } </code></pre> <p>Upon running, I get the following print out: Apr 27, 2013 12:50:16 PM </p> <pre><code>com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify WARNING: Obsolete content type encountered: 'application/x-javascript'. Apr 27, 2013 12:50:16 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify WARNING: Obsolete content type encountered: 'application/x-javascript'. Apr 27, 2013 12:50:16 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify WARNING: Obsolete content type encountered: 'application/x-javascript'. Apr 27, 2013 12:50:16 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify WARNING: Obsolete content type encountered: 'application/x-javascript'. Apr 27, 2013 12:50:16 PM com.gargoylesoftware.htmlunit.javascript.host.ActiveXObject jsConstructor WARNING: Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash'. Apr 27, 2013 12:50:16 PM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError SEVERE: runtimeError: message=[Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash'.] sourceName=[http://www.wwnorton.com/common/js/shadowbox/shadowbox.js] line=[8] lineSource=[null] lineOffset=[0] Apr 27, 2013 12:50:16 PM com.gargoylesoftware.htmlunit.javascript.host.ActiveXObject jsConstructor WARNING: Automation server can't create object for 'QuickTime.QuickTime'. Apr 27, 2013 12:50:16 PM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError SEVERE: runtimeError: message=[Automation server can't create object for 'QuickTime.QuickTime'.] sourceName=[http://www.wwnorton.com/common/js/shadowbox/shadowbox.js] line=[8] lineSource=[null] lineOffset=[0] Apr 27, 2013 12:50:16 PM com.gargoylesoftware.htmlunit.javascript.host.ActiveXObject jsConstructor WARNING: Automation server can't create object for 'wmplayer.ocx'. Apr 27, 2013 12:50:16 PM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError SEVERE: runtimeError: message=[Automation server can't create object for 'wmplayer.ocx'.] sourceName=[http://www.wwnorton.com/common/js/shadowbox/shadowbox.js] line=[8] lineSource=[null] lineOffset=[0] Apr 27, 2013 12:50:16 PM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError SEVERE: runtimeError: message=[The data necessary to complete this operation is not yet available.] sourceName=[http://www.wwnorton.com/common/js/shadowbox/shadowbox.js] line=[8] lineSource=[null] lineOffset=[0] Apr 27, 2013 12:50:16 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify WARNING: Obsolete content type encountered: 'application/x-javascript'. Apr 27, 2013 12:50:17 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify WARNING: Obsolete content type encountered: 'application/x-javascript'. Apr 27, 2013 12:50:17 PM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError SEVERE: runtimeError: message=[The data necessary to complete this operation is not yet available.] sourceName=[http://www.wwnorton.com/college/polisci/we-the-people8/shorter/ScriptResource.axd?d=_NFxbFNZD5BQbaNY82unni-tpvKHIpx_DFI8m05N9H4ZnCF8k_zg2bVneHOOjAQ58itL8--3tACMpiC67WkR4iVWW6J5oqm5iyilArcp1bA4Jl6UUf2tHGjuNfP4BmYCDciCRxCM3FV_f5qrDZM2IQ2&amp;t=fffffffff9cbe881] line=[164] lineSource=[null] lineOffset=[0] Apr 27, 2013 12:50:17 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify WARNING: Obsolete content type encountered: 'application/x-javascript'. Apr 27, 2013 12:50:17 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify WARNING: Obsolete content type encountered: 'text/javascript'. Apr 27, 2013 12:50:17 PM com.gargoylesoftware.htmlunit.javascript.host.html.HTMLDocument execCommand WARNING: Nothing done for execCommand(BackgroundImageCache, ...) (feature not implemented) Apr 27, 2013 12:50:17 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error WARNING: CSS error: 'http://www.wwnorton.com/common/css/ss2.0/base.css' [848:2] Error in style rule. (Invalid token "!important". Was expecting one of: &lt;EOF&gt;, &lt;S&gt;, &lt;IDENT&gt;, "}", ";".) Apr 27, 2013 12:50:17 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning WARNING: CSS warning: 'http://www.wwnorton.com/common/css/ss2.0/base.css' [848:2] Ignoring the following declarations in this rule. Apr 27, 2013 12:50:17 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error WARNING: CSS error: 'http://www.wwnorton.com/common/css/min/reset-min.css' [57:1] Error in style rule. (Invalid token "*". Was expecting one of: &lt;EOF&gt;, &lt;S&gt;, &lt;IDENT&gt;, "}", ";".) Apr 27, 2013 12:50:17 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning WARNING: CSS warning: 'http://www.wwnorton.com/common/css/min/reset-min.css' [57:1] Ignoring the following declarations in this rule. Apr 27, 2013 12:50:18 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error WARNING: CSS error: 'http://www.wwnorton.com/common/css/min/base-min.css' [8:580] Error in style rule. (Invalid token "*". Was expecting one of: &lt;EOF&gt;, &lt;S&gt;, &lt;IDENT&gt;, "}", ";".) Apr 27, 2013 12:50:18 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning WARNING: CSS warning: 'http://www.wwnorton.com/common/css/min/base-min.css' [8:580] Ignoring the following declarations in this rule. Apr 27, 2013 12:50:18 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error WARNING: CSS error: 'http://www.wwnorton.com/common/css/min/fonts-min.css' [8:55] Error in style rule. (Invalid token "*". Was expecting one of: &lt;EOF&gt;, &lt;S&gt;, &lt;IDENT&gt;, "}", ";".) Apr 27, 2013 12:50:18 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning WARNING: CSS warning: 'http://www.wwnorton.com/common/css/min/fonts-min.css' [8:55] Ignoring the following declarations in this rule. Apr 27, 2013 12:50:18 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error WARNING: CSS error: 'http://www.wwnorton.com/common/css/min/fonts-min.css' [8:237] Error in style rule. (Invalid token "*". Was expecting one of: &lt;EOF&gt;, &lt;S&gt;, &lt;IDENT&gt;, "}", ";".) Apr 27, 2013 12:50:18 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning WARNING: CSS warning: 'http://www.wwnorton.com/common/css/min/fonts-min.css' [8:237] Ignoring the following declarations in this rule. Apr 27, 2013 12:50:18 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error WARNING: CSS error: 'http://www.wwnorton.com/college/polisci/we-the-people8/shorter/css/custom.css' [23:1] Error in style rule. (Invalid token "body". Was expecting one of: &lt;S&gt;, &lt;LBRACE&gt;, &lt;COMMA&gt;.) Apr 27, 2013 12:50:18 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning WARNING: CSS warning: 'http://www.wwnorton.com/college/polisci/we-the-people8/shorter/css/custom.css' [23:1] Ignoring the following declarations in this rule. Apr 27, 2013 12:50:18 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error WARNING: CSS error: 'http://www.wwnorton.com/common/colorblindcss/ss2.0/style.css' [1:51] Error in style rule. (Invalid token "&lt;EOF&gt;". Was expecting one of: &lt;S&gt;, &lt;LBRACE&gt;, &lt;COMMA&gt;, &lt;HASH&gt;, ".", ":", "[", &lt;S&gt;.) Apr 27, 2013 12:50:18 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning WARNING: CSS warning: 'http://www.wwnorton.com/common/colorblindcss/ss2.0/style.css' [1:51] Ignoring the following declarations in this rule. Exception in thread "main" ======= EXCEPTION START ======== Exception class=[java.lang.RuntimeException] com.gargoylesoftware.htmlunit.ScriptException: Exception invoking setOuterHTML at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:669) at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:601) at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:507) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:555) at com.gargoylesoftware.htmlunit.html.HtmlPage.loadExternalJavaScriptFile(HtmlPage.java:1082) at com.gargoylesoftware.htmlunit.html.HtmlScript.executeScriptIfNeeded(HtmlScript.java:399) at com.gargoylesoftware.htmlunit.html.HtmlScript$3.execute(HtmlScript.java:260) at com.gargoylesoftware.htmlunit.html.HtmlScript.onAllChildrenAddedToPage(HtmlScript.java:276) at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:676) at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source) at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:635) at org.cyberneko.html.HTMLTagBalancer.callEndElement(HTMLTagBalancer.java:1170) at org.cyberneko.html.HTMLTagBalancer.endElement(HTMLTagBalancer.java:1072) at org.cyberneko.html.filters.DefaultFilter.endElement(DefaultFilter.java:206) at org.cyberneko.html.filters.NamespaceBinder.endElement(NamespaceBinder.java:330) at org.cyberneko.html.HTMLScanner$ContentScanner.scanEndElement(HTMLScanner.java:3074) at org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2041) at org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:918) at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:499) at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:452) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.parse(HTMLParser.java:892) at com.gargoylesoftware.htmlunit.html.HTMLParser.parse(HTMLParser.java:241) at com.gargoylesoftware.htmlunit.html.HTMLParser.parseHtml(HTMLParser.java:187) at com.gargoylesoftware.htmlunit.DefaultPageCreator.createHtmlPage(DefaultPageCreator.java:268) at com.gargoylesoftware.htmlunit.DefaultPageCreator.createPage(DefaultPageCreator.java:156) at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:434) at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:309) at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:374) at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:359) at HelloWorld.homePage(HelloWorld.java:16) at HelloWorld.main(HelloWorld.java:7) Caused by: java.lang.RuntimeException: Exception invoking setOuterHTML at net.sourceforge.htmlunit.corejs.javascript.MemberBox.invoke(MemberBox.java:148) at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject$GetterSlot.setValue(ScriptableObject.java:295) at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject$RelinkedSlot.setValue(ScriptableObject.java:368) at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject.putImpl(ScriptableObject.java:2796) at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject.put(ScriptableObject.java:521) at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject.putProperty(ScriptableObject.java:2479) at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.setObjectProp(ScriptRuntime.java:1569) at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.setObjectProp(ScriptRuntime.java:1564) at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1253) at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:798) at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:105) at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:405) at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.doTopCall(HtmlUnitContextFactory.java:275) at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3031) at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.exec(InterpretedFunction.java:115) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$3.doRun(JavaScriptEngine.java:546) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:654) ... 31 more Caused by: java.lang.IllegalStateException: Previous sibling for HtmlDivision[&lt;div style="height: 0px; overflow: hidden; border-top: solid black; border-top-width: thick;"&gt;] is null. at com.gargoylesoftware.htmlunit.html.DomNode.insertBefore(DomNode.java:1036) at com.gargoylesoftware.htmlunit.javascript.host.html.HTMLElement.setOuterHTML(HTMLElement.java:1067) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at net.sourceforge.htmlunit.corejs.javascript.MemberBox.invoke(MemberBox.java:120) ... 47 more Enclosed exception: java.lang.RuntimeException: Exception invoking setOuterHTML at net.sourceforge.htmlunit.corejs.javascript.MemberBox.invoke(MemberBox.java:148) at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject$GetterSlot.setValue(ScriptableObject.java:295) at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject$RelinkedSlot.setValue(ScriptableObject.java:368) at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject.putImpl(ScriptableObject.java:2796) at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject.put(ScriptableObject.java:521) at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject.putProperty(ScriptableObject.java:2479) at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.setObjectProp(ScriptRuntime.java:1569) at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.setObjectProp(ScriptRuntime.java:1564) at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1253) at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:798) at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:105) at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:405) at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.doTopCall(HtmlUnitContextFactory.java:275) at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3031) at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.exec(InterpretedFunction.java:115) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$3.doRun(JavaScriptEngine.java:546) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:654) at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:601) at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:507) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:555) at com.gargoylesoftware.htmlunit.html.HtmlPage.loadExternalJavaScriptFile(HtmlPage.java:1082) at com.gargoylesoftware.htmlunit.html.HtmlScript.executeScriptIfNeeded(HtmlScript.java:399) at com.gargoylesoftware.htmlunit.html.HtmlScript$3.execute(HtmlScript.java:260) at com.gargoylesoftware.htmlunit.html.HtmlScript.onAllChildrenAddedToPage(HtmlScript.java:276) at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:676) at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source) at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:635) at org.cyberneko.html.HTMLTagBalancer.callEndElement(HTMLTagBalancer.java:1170) at org.cyberneko.html.HTMLTagBalancer.endElement(HTMLTagBalancer.java:1072) at org.cyberneko.html.filters.DefaultFilter.endElement(DefaultFilter.java:206) at org.cyberneko.html.filters.NamespaceBinder.endElement(NamespaceBinder.java:330) at org.cyberneko.html.HTMLScanner$ContentScanner.scanEndElement(HTMLScanner.java:3074) at org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2041) at org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:918) at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:499) at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:452) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.parse(HTMLParser.java:892) at com.gargoylesoftware.htmlunit.html.HTMLParser.parse(HTMLParser.java:241) at com.gargoylesoftware.htmlunit.html.HTMLParser.parseHtml(HTMLParser.java:187) at com.gargoylesoftware.htmlunit.DefaultPageCreator.createHtmlPage(DefaultPageCreator.java:268) at com.gargoylesoftware.htmlunit.DefaultPageCreator.createPage(DefaultPageCreator.java:156) at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:434) at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:309) at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:374) at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:359) at HelloWorld.homePage(HelloWorld.java:16) at HelloWorld.main(HelloWorld.java:7) Caused by: java.lang.IllegalStateException: Previous sibling for HtmlDivision[&lt;div style="height: 0px; overflow: hidden; border-top: solid black; border-top-width: thick;"&gt;] is null. at com.gargoylesoftware.htmlunit.html.DomNode.insertBefore(DomNode.java:1036) at com.gargoylesoftware.htmlunit.javascript.host.html.HTMLElement.setOuterHTML(HTMLElement.java:1067) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at net.sourceforge.htmlunit.corejs.javascript.MemberBox.invoke(MemberBox.java:120) ... 47 more ======= EXCEPTION END ======== </code></pre> <p>What's causing all of these exceptions? Why am I (seemingly) not able to open the page? </p> <h3>Edit</h3> <p>I tried loading <a href="http://www1.plasticsurgery.org/find_a_surgeon/" rel="nofollow">another page</a> that has <code>javascript</code> driven menus and got similar output. </p> <p>I then tried simply loading Yahoo.com. </p> <pre><code>String url = "http://www.yahoo.com"; final HtmlPage page = webClient.getPage(url); System.out.println(page.asText()); </code></pre> <p>No errors this time, but <code>page.asText()</code> returns nothing..</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload