Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>The byte order mark is likely to be one of these byte sequences:</p> <pre><code> UTF-8 BOM: ef bb bf UTF-16BE BOM: fe ff UTF-16LE BOM: ff fe UTF-32BE BOM: 00 00 fe ff UTF-32LE BOM: ff fe 00 00 </code></pre> <p>These are the variously encoded forms of the Unicode codepoint U+FEFF. This can be expressed as a Java char literal using <code>'\uFEFF'</code> (Java char values are <em>implicitly</em> UTF-16). Since U+FEFF isn't in most encodings, it is not possible for this BOM codepoint to be encoded by them. (<a href="http://illegalargumentexception.blogspot.com/2009/05/java-rough-guide-to-character-encoding.html#javaencoding_boms" rel="noreferrer">More on encoding the BOM using Java here</a>.)</p> <p>When it comes to BOMs and XML, they are optional (see also the <a href="http://unicode.org/faq/utf_bom.html#BOM" rel="noreferrer">Unicode BOM FAQ</a>). Detection of encoding in XML is relatively straightforward if the encoding is specified in the declaration. Always make sure that the XML declaration (<code>&lt;?xml version="1.0" encoding="UTF-8"?&gt;</code>) matches the encoding used to write the document. If you are strict about this, parsers should be able to interpret your documents correctly. (<a href="http://www.w3.org/TR/REC-xml/#sec-guessing" rel="noreferrer">XML spec on encoding detection.</a>)</p> <p>I advocate encoding as Unicode wherever possible (see also the <a href="http://cafe.elharo.com/programming/the-ten-commandments-of-unicode/" rel="noreferrer">10 Commandments of Unicode</a>). That said, XML allows the representation of any Unicode character via escape entities (e.g. 'A' could be represented by <code>&amp;#x0041;</code>), so it isn't necessarily a requirement to avoid data loss.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload