Note that there are some explanatory texts on larger screens.

plurals
  1. POProducing valid XML with Java and UTF-8 encoding
    text
    copied!<p>I am using JAXP to generate and parse an XML document from which some fields are loaded from a database.</p> <p>Code to serialize the XML:</p> <pre><code>DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder(); Document doc = builder.newDocument(); Element root = doc.createElement("test"); root.setAttribute("version", text); doc.appendChild(root); DOMSource domSource = new DOMSource(doc); TransformerFactory tFactory = TransformerFactory.newInstance(); FileWriter out = new FileWriter("test.xml"); Transformer transformer = tFactory.newTransformer(); transformer.setOutputProperty(OutputKeys.INDENT, "yes"); transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8"); transformer.transform(domSource, new StreamResult(out)); </code></pre> <p>Code to parse the XML:</p> <pre><code>DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); factory.setNamespaceAware(true); DocumentBuilder builder = factory.newDocumentBuilder(); Document doc = builder.parse("test.xml"); </code></pre> <p>And I encounter the following exception:</p> <pre><code>[Fatal Error] test.xml:1:4: Invalid byte 1 of 1-byte UTF-8 sequence. Exception in thread "main" org.xml.sax.SAXParseException: Invalid byte 1 of 1-byte UTF-8 sequence. at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) at javax.xml.parsers.DocumentBuilder.parse(Unknown Source) at com.test.Test.xml(Test.java:27) at com.test.Test.main(Test.java:55) </code></pre> <p>The String text includes u-umlaut and o-umlaut (character codes 0xFC and 0xF6). These are the characters that are causing the error. When I escape the String myself to use &amp;#xFC; and &amp;#xF6; then the problem goes away. Other entities are automatically encoded when I write out the XML.</p> <p>How do I get my output to be written / read properly without substituting these characters myself?</p> <p>(I've read the following questions already:</p> <p><a href="https://stackoverflow.com/questions/156697/how-to-encode-characters-from-oracle-to-xml">How to encode characters from Oracle to XML?</a></p> <p><a href="https://stackoverflow.com/questions/216890/repairing-wrong-encoding-in-xml-files">Repairing wrong encoding in XML files</a>)</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload