Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p><code>char</code> is for 8-bit code units, <code>char16_t</code> is for 16-bit code units, and <code>char32_t</code> is for 32-bit code units. Any of these can be used for 'Unicode'; UTF-8 uses 8-bit code units, UTF-16 uses 16-bit code units, and UTF-32 uses 32-bit code units.</p> <hr> <p>The guarantee made for <code>wchar_t</code> was that any character supported in a locale could be converted from <code>char</code> to <code>wchar_t</code>, and whatever representation was used for <code>char</code>, be it multiple bytes, shift codes, what have you, the <code>wchar_t</code> would be a single, distinct value. The purpose of this was that then you could manipulate <code>wchar_t</code> strings just like the simple algorithms used with ASCII.</p> <p>For example, converting ascii to upper case goes like:</p> <pre><code>auto loc = std::locale(""); char s[] = "hello"; for (char &amp;c : s) { c = toupper(c, loc); } </code></pre> <p>But this won't handle converting all characters in UTF-8 to uppercase, or all characters in some other encoding like Shift-JIS. People wanted to be able to internationalize this code like so:</p> <pre><code>auto loc = std::locale(""); wchar_t s[] = L"hello"; for (wchar_t &amp;c : s) { c = toupper(c, loc); } </code></pre> <p>So every <code>wchar_t</code> is a 'character' and if it has an uppercase version then it can be directly converted. Unfortunately this doesn't really work all the time; For example there exist oddities in some languages such as the German letter ß where the uppercase version is actually the two characters SS instead of a single character.</p> <p>So internationalized text handling is intrinsically harder than ASCII and cannot really be simplified in the way the designers of <code>wchar_t</code> intended. As such <code>wchar_t</code> and wide characters in general provide little value.</p> <p>The only reason to use them is that they've been baked into some APIs and platforms. However, I prefer to stick to UTF-8 in my own code even when developing on such platforms, and to just convert at the API boundaries to whatever encoding is required.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload