Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Bytes are not characters.</p> <p>Computingwise, a "character" is a pairing of a numeric code (or sequence of codes) with an <em>encoding</em> or <em>character set</em> that defines how the codes map to real-world characters (or to whitespace, or to control codes).</p> <p><em>Only once paired with an encoding</em> can bytes represent characters. For some encodings (like ASCII or ISO-8859-1), one byte can represent one character...and many encodings are even ASCII-compatible (meaning that the character codes from 0 to 127 align with ASCII's definition for them)...but without the original mapping, you don't know what you have.</p> <p><strong>Without an encoding, bytes are just 8-bit integers.</strong></p> <p>You can interpret them any way you like, and you might even get something usable...but without knowing the encoding, you don't know for sure what they represent.</p> <p>It might not even be text.</p> <p>For example, consider the byte sequence <code>0x48 0x65 0x6c 0x6c 0x6f 0x2e</code>. It can be interpreted as:</p> <ul> <li><code>Hello.</code> in ASCII and compatible 8-bit encodings;</li> <li><code>dinner</code> in some 8-bit encoding i made up just to prove this point;</li> <li><code>䡥汬漮</code> in big-endian UTF-16<sup>*</sup>;</li> <li>a steel-blue pixel followed by a greyish-yellowish one, in RGB;</li> <li><code>load r101, [0x6c6c6f2e]</code> in some unknown processor's assembly language;</li> </ul> <p>or any of a million other things. Those six bytes alone can't tell you which interpretation is correct.</p> <p>With text, at least, that's what encodings are for.</p> <p>But if you want the interpretation to be right, you need to use the same encoding to decode those bytes as was used to generate them. That's why it's so important to know how your text was encoded.</p> <hr> <p>The difference between a byte stream and a character stream is that the character stream attempts to work with characters rather than bytes. (It actually works with UTF-16 code units. But since we know the encoding, that's good enough for most purposes.) If it's wrapped around a byte stream, the character stream uses an encoding to convert the bytes read from the underlying byte stream to <code>char</code>s (or <code>char</code>s written to the stream to bytes).</p> <p><sup>* Note: I don't know whether "䡥汬漮" is profanity or even makes any sense...but neither does a computer unless you program it to read Chinese.</sup></p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload