Note that there are some explanatory texts on larger screens.

plurals
  1. POInvalid XML Character During Unmarshall
    primarykey
    data
    text
    <p>I am marshalling objects to XML file using encoding "UTF-8". It generates file successfully. But when I try to unmarshal it back, there is an error:</p> <blockquote> <p>An invalid XML character (Unicode: 0x{2}) was found in the value of attribute "{1}" and element is "0"</p> </blockquote> <p>The character is 0x1A or \u001a, which is valid in UTF-8 but illegal in XML. Marshaller in JAXB allows writing this character into XML file, but Unmarshaller cannot parse it back. I tried to use another encoding (UTF-16, ASCII, etc) but still error.</p> <p>The common solution is to remove/replace this invalid character before XML parsing. But if we need this character back, how to get the original character after unmarshalling?</p> <hr> <p>While looking for this solution, I want to replace the invalid characters with a substitute character (for example dot = ".") before unmarshalling.</p> <p>I have created this class:</p> <pre><code>public class InvalidXMLCharacterFilterReader extends FilterReader { public static final char substitute = '.'; public InvalidXMLCharacterFilterReader(Reader in) { super(in); } @Override public int read(char[] cbuf, int off, int len) throws IOException { int read = super.read(cbuf, off, len); if (read == -1) return -1; for (int readPos = off; readPos &lt; off + read; readPos++) { if(!isValid(cbuf[readPos])) { cbuf[readPos] = substitute; } } return readPos - off + 1; } public boolean isValid(char c) { if((c == 0x9) || (c == 0xA) || (c == 0xD) || ((c &gt;= 0x20) &amp;&amp; (c &lt;= 0xD7FF)) || ((c &gt;= 0xE000) &amp;&amp; (c &lt;= 0xFFFD)) || ((c &gt;= 0x10000) &amp;&amp; (c &lt;= 0x10FFFF))) { return true; } else return false; } } </code></pre> <p>Then this is how I read and unmarshall the file:</p> <pre><code>FileReader fileReader = new FileReader(this.getFile()); Reader reader = new InvalidXMLCharacterFilterReader(fileReader); Object o = (Object)um.unmarshal(reader); </code></pre> <p>Somehow the reader does not replace invalid characters with the character I want. It results a wrong XML data which can't be unmarshalled. Is there something wrong with my InvalidXMLCharacterFilterReader class?</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload