Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>I think you may be unfamiliar with the concept of encodings, from reading your post.</p> <p>For instance, you say "unicode of ... ₤ is 0x00A3". That is true - unicode codepoint U+00A3 is the pound sign. But 0x00A3 is not how you represent the pound sign in, for example, UTF-8 (a particular common encoding of Unicode). Take a <a href="http://www.ltg.ed.ac.uk/~richard/utf-8.cgi?input=00A3&amp;mode=hex" rel="nofollow">look here</a> to see what I mean. As you can see, the UTF-8 encoding of U+00A3 is the two bytes is <code>0xc2</code>, <code>0xa3</code> (in that order).</p> <p>There are several things that happen between your call to <code>printf()</code> and when something appears on your screen.</p> <p>First, your program runs the code <code>printf("abc\x0fdef")</code>, and that means that the following bytes in order, are written to stdout for your program:</p> <pre><code>0x61, 0x62, 0x63, 0x0f, 0x64, 0x65, 0x66 </code></pre> <p>Note: I'm assuming your source code is ASCII (or UTF-8), which is very common. Technically, the interpretation of your source code's character set is implementation-defined, I believe.</p> <p>Now, in order to see output, you will typically be running this program inside some kind of shell, and it has to eventually transform those bytes into visual characters. It does this by using an encoding. Again, something ASCII-compatible is common, such as UTF-8. On Windows, CP1252 is common.</p> <p>And if that is the case, you get the following mapping:</p> <pre><code>0x61 - a 0x62 - b 0x63 - c 0x0f - the 'shift in' ASCII control code 0x64 - d 0x65 - e 0x66 - f </code></pre> <p>This prints out as "abcdef" because the 'shift in' control code is a non-printing character.</p> <p>Note: The above can change depending on what exact character sets are involved, but ASCII or UTF-8 is very likely what you're dealing with unless you have an exotic setup.</p> <p>If you have a UTF-8 compatible terminal, the following should print out "abc₤def", just as an example to get you started:</p> <pre><code>printf("abc\xc2\xa3def"); </code></pre> <p>Make sense?</p> <hr> <p><strong>Update:</strong> To answer the question from your comment: you need to distinguish between a <em>codepoint</em> and the byte values for an <em>encoding</em> of that codepoint.</p> <p>The Unicode standard defines 'codepoints' which are numerical values for characters. These are commonly written as U+XYZ where XYZ is a hexidecimal value. For instance, the character U+219e is <a href="http://www.fileformat.info/info/unicode/char/219e/index.htm" rel="nofollow">LEFTWARDS TWO HEADED ARROW</a>. This might also be written 0x219e. You would know from context that the writer is talking about a codepoint.</p> <p>When you need to encode that codepoint (to print, or save to file, etc), you use an encoding, such as UTF-8. Note, if you used, for example, the <a href="http://en.wikipedia.org/wiki/UTF-32" rel="nofollow">UTF-32</a> encoding, every codepoint corresponds exactly to the encoded value. So in UTF-32, the codepoint U+219e would indeed be encoded simply as 0x219e. But other encodings will do things differently. UTF-8 will encode U+219e as the three bytes <code>0xE2 0x86 0x9E</code>.</p> <p>Lastly, the <code>\x</code> notation is simply how you write arbitrary byte values inside a C/C++ quoted string. If I write, in C source code, <code>"\xff"</code>, then that string in memory will be the two bytes <code>0xff 0x00</code> (since it automatically gets a null terminator).</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload