Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>I may have a solution to your problem, if you're willing to give up on XOM. My solution consists of using the <a href="http://docs.oracle.com/javase/7/docs/api/javax/xml/xpath/XPath.html" rel="nofollow">XPath API</a> and <a href="http://santuario.apache.org/" rel="nofollow">Apache Santuario</a>.</p> <p>The difference in performance is impressive, but I thought it would be good to provide a comparison.</p> <p>For the tests I've used the XML file you provided in your question with 1.5MB.</p> <h2>The XOM Test</h2> <pre><code>FileInputStream xmlFile = new FileInputStream("input.xml"); Builder builder = new Builder(false); Document doc = builder.build(xmlFile); FileOutputStream fos = new FileOutputStream("output.xml"); nu.xom.canonical.Canonicalizer outputter = new nu.xom.canonical.Canonicalizer(fos); Nodes nodes = doc.getRootElement().query("./descendant-or-self::node()|./@*"); outputter.write(nodes); fos.close(); </code></pre> <h2>The XPath/Santuario Test</h2> <pre><code>org.apache.xml.security.Init.init(); DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance(); domFactory.setNamespaceAware(true); DocumentBuilder builder = domFactory.newDocumentBuilder(); org.w3c.dom.Document doc = builder.parse("input.xml"); XPathFactory xpathFactory = XPathFactory.newInstance(); XPath xpath = xpathFactory.newXPath(); org.w3c.dom.NodeList result = (org.w3c.dom.NodeList) xpath.evaluate("./descendant-or-self::node()|./@*", doc, XPathConstants.NODESET); Canonicalizer canon = Canonicalizer.getInstance(Canonicalizer.ALGO_ID_C14N_OMIT_COMMENTS); byte canonXmlBytes[] = canon.canonicalizeXPathNodeSet(result); IOUtils.write(canonXmlBytes, new FileOutputStream(new File("output.xml"))); </code></pre> <h2>The Results</h2> <p><img src="https://lh6.googleusercontent.com/-NeJJ2ETZ-JQ/UMClI5Qa58I/AAAAAAAABG4/QC9fH3a_hvw/s477/Selection_001.png" alt="graphic result"></p> <p>Below is a table with the results in seconds. Tests were performed 16 times.</p> <pre><code>╔═════════════════╦═════════╦═══════════╗ ║ Test ║ Average ║ Std. Dev. ║ ╠═════════════════╬═════════╬═══════════╣ ║ XOM ║ 140.433 ║ 4.851 ║ ╠═════════════════╬═════════╬═══════════╣ ║ XPath/Santuario ║ 2.4585 ║ 0.11187 ║ ╚═════════════════╩═════════╩═══════════╝ </code></pre> <p>The difference in performance is huge and it is related with the implementation of the <a href="http://www.w3.org/TR/xpath/" rel="nofollow">XML Path Language</a>. The downside of using XPath/Santuario is that they're not as simple as XOM.</p> <h2>Test Details</h2> <p>Machine: Intel Core i5 4GB RAM<br> SO: Debian 6.0 64bit<br> Java: OpenJDK 1.6.0_18 64bit<br> XOM: 1.2.8<br> Apache Santuario: 1.5.3<br></p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload