Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Java strings are UTF-16. All other encodings can be represented using byte sequences. To decode character data, you must provide the encoding when you first create the string. If you have a corrupted string, it is already too late.</p> <p>Assuming ID3, the specifications define the rules for encoding. For example, <a href="http://www.id3.org/id3v2.4.0-structure" rel="nofollow">ID3v2.4.0</a> might restrict the encodings used via an extended header:</p> <blockquote> <p>q - Text encoding restrictions</p> <pre><code> 0 No restrictions 1 Strings are only encoded with ISO-8859-1 [ISO-8859-1] or UTF-8 [UTF-8]. </code></pre> </blockquote> <p>Encoding handling is defined further down the document:</p> <blockquote> <p>If nothing else is said, strings, including numeric strings and URLs, are represented as ISO-8859-1 characters in the range $20 - $FF. Such strings are represented in frame descriptions as <code>&lt;text string&gt;</code>, or <code>&lt;full text string&gt;</code> if newlines are allowed. If nothing else is said newline character is forbidden. In ISO-8859-1 a newline is represented, when allowed, with $0A only.</p> <p>Frames that allow different types of text encoding contains a text encoding description byte. Possible encodings:</p> <pre><code> $00 ISO-8859-1 [ISO-8859-1]. Terminated with $00. $01 UTF-16 [UTF-16] encoded Unicode [UNICODE] with BOM. All strings in the same frame SHALL have the same byteorder. Terminated with $00 00. $02 UTF-16BE [UTF-16] encoded Unicode [UNICODE] without BOM. Terminated with $00 00. $03 UTF-8 [UTF-8] encoded Unicode [UNICODE]. Terminated with $00. </code></pre> </blockquote> <p>Use transcoding classes like <code>InputStreamReader</code> or (more likely in this case) the <a href="http://download.oracle.com/javase/6/docs/api/java/lang/String.html#String%28byte%5B%5D,%20java.nio.charset.Charset%29" rel="nofollow"><code>String(byte[],Charset)</code></a> constructor to decode the data. See also <a href="http://illegalargumentexception.blogspot.com/2009/05/java-rough-guide-to-character-encoding.html" rel="nofollow">Java: a rough guide to character encoding</a>.</p> <hr> <p>Parsing the string components of an ID3v2.4.0 data structure would something like this:</p> <pre><code>//untested code public String parseID3String(DataInputStream in) throws IOException { String[] encodings = { "ISO-8859-1", "UTF-16", "UTF-16BE", "UTF-8" }; String encoding = encodings[in.read()]; byte[] terminator = encoding.startsWith("UTF-16") ? new byte[2] : new byte[1]; byte[] buf = terminator.clone(); ByteArrayOutputStream buffer = new ByteArrayOutputStream(); do { in.readFully(buf); buffer.write(buf); } while (!Arrays.equals(terminator, buf)); return new String(buffer.toByteArray(), encoding); } </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload