StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
17177904
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
9
CommunityOwnedDate
CreationDate
2013-06-18T20:06:00.460
FavoriteCount
0
LastActivityDate
2014-06-12T05:34:15.950
LastEditDate
2014-06-12T05:34:15.950
LastEditorUserId
14558
OwnerUserId
14558
ParentId
1259084
PostTypeId
2
Score
336
ViewCount
0
LastEditorDisplayName
text
Body
Yes, it’s frustrating—sometimes <code>type</code> and other programs print gibberish, and sometimes they do not. First of all, Unicode characters will only display <a href="http://support.microsoft.com/kb/99795">if the current console font contains the characters</a>. So use a TrueType font like Lucida Console instead of the default Raster Font. But if the console font doesn’t contain the character you’re trying to display, you’ll see question marks instead of gibberish. When you get gibberish, there’s more going on than just font settings. When programs use standard C-library I/O functions like <code>printf</code>, the program’s output encoding must match the console’s output encoding, or you will get gibberish. <code>chcp</code> shows and sets the current codepage. All output using standard C-library I/O functions is treated as if it is in the codepage displayed by <code>chcp</code>. Matching the program’s output encoding with the console’s output encoding can be accomplished in two different ways: <ul> <li>A program can get the console’s current codepage using <code>chcp</code> or <a href="http://msdn.microsoft.com/en-us/library/windows/desktop/ms683169(v=vs.85).aspx"><code>GetConsoleOutputCP</code></a>, and configure itself to output in that encoding, or</li> <li>You or a program can set the console’s current codepage using <code>chcp</code> or <a href="http://msdn.microsoft.com/en-us/library/windows/desktop/ms686036(v=vs.85).aspx"><code>SetConsoleOutputCP</code></a> to match the default output encoding of the program.</li> </ul> However, programs that use Win32 APIs can write UTF-16LE strings directly to the console with <a href="http://msdn.microsoft.com/en-us/library/windows/desktop/ms687401(v=vs.85).aspx"><code>WriteConsoleW</code></a>. This is the only way to get correct output without setting codepages. And even when using that function, if a string is not in the UTF-16LE encoding to begin with, a Win32 program must pass the correct codepage to <a href="http://msdn.microsoft.com/en-us/library/windows/desktop/dd319072(v=vs.85).aspx"><code>MultiByteToWideChar</code></a>. Also, <code>WriteConsoleW</code> will not work if the program’s output is redirected; more fiddling is needed in that case. <code>type</code> works some of the time because it checks the start of each file for a UTF-16LE <a href="https://en.wikipedia.org/wiki/Byte_order_mark">Byte Order Mark (BOM)</a>, i.e. the bytes <code>0xFF 0xFE</code>. If it finds such a mark, it displays the Unicode characters in the file using <code>WriteConsoleW</code> regardless of the current codepage. But when <code>type</code>ing any file without a UTF-16LE BOM, or for using non-ASCII characters with any command that doesn’t call <code>WriteConsoleW</code>—you will need to set the console codepage and program output encoding to match each other. <hr> How can we find this out? Here’s a test file containing Unicode characters: <pre><code>ASCII abcde xyz German äöü ÄÖÜ ß Polish ąęźżńł Russian абвгдеж эюя CJK 你好 </code></pre> Here’s a Java program to print out the test file in a bunch of different Unicode encodings. It could be in any programming language; it only prints ASCII characters or encoded bytes to <code>stdout</code>. <pre class="lang-java prettyprint-override"><code>import java.io.*; public class Foo { private static final String BOM = "\ufeff"; private static final String TEST_STRING = "ASCII abcde xyz\n" + "German äöü ÄÖÜ ß\n" + "Polish ąęźżńł\n" + "Russian абвгдеж эюя\n" + "CJK 你好\n"; public static void main(String[] args) throws Exception { String[] encodings = new String[] { "UTF-8", "UTF-16LE", "UTF-16BE", "UTF-32LE", "UTF-32BE" }; for (String encoding: encodings) { System.out.println("== " + encoding); for (boolean writeBom: new Boolean[] {false, true}) { System.out.println(writeBom ? "= bom" : "= no bom"); String output = (writeBom ? BOM : "") + TEST_STRING; byte[] bytes = output.getBytes(encoding); System.out.write(bytes); FileOutputStream out = new FileOutputStream("uc-test-" + encoding + (writeBom ? "-bom.txt" : "-nobom.txt")); out.write(bytes); out.close(); } } } } </code></pre> The output in the default codepage? Total garbage! <pre><code>Z:\andrew\projects\sx\1259084>chcp Active code page: 850 Z:\andrew\projects\sx\1259084>java Foo == UTF-8 = no bom ASCII abcde xyz German ├ñ├Â├╝ ├ä├û├£ ├ƒ Polish ─à─Ö┼║┼╝┼ä┼é Russian ð░ð▒ð▓ð│ð┤ðÁðÂ ÐìÐÄÐÅ CJK õ¢áÕÑ¢ = bom ´╗┐ASCII abcde xyz German ├ñ├Â├╝ ├ä├û├£ ├ƒ Polish ─à─Ö┼║┼╝┼ä┼é Russian ð░ð▒ð▓ð│ð┤ðÁðÂ ÐìÐÄÐÅ CJK õ¢áÕÑ¢ == UTF-16LE = no bom A S C I I a b c d e x y z G e r m a n õ ÷ ³ ─ Í ▄ ▀ P o l i s h ♣☺↓☺z☺|☺D☺B☺ R u s s i a n 0♦1♦2♦3♦4♦5♦6♦ M♦N♦O♦ C J K `O}Y = bom ■A S C I I a b c d e x y z G e r m a n õ ÷ ³ ─ Í ▄ ▀ P o l i s h ♣☺↓☺z☺|☺D☺B☺ R u s s i a n 0♦1♦2♦3♦4♦5♦6♦ M♦N♦O♦ C J K `O}Y == UTF-16BE = no bom A S C I I a b c d e x y z G e r m a n õ ÷ ³ ─ Í ▄ ▀ P o l i s h ☺♣☺↓☺z☺|☺D☺B R u s s i a n ♦0♦1♦2♦3♦4♦5♦6 ♦M♦N♦O C J K O`Y} = bom ■ A S C I I a b c d e x y z G e r m a n õ ÷ ³ ─ Í ▄ ▀ P o l i s h ☺♣☺↓☺z☺|☺D☺B R u s s i a n ♦0♦1♦2♦3♦4♦5♦6 ♦M♦N♦O C J K O`Y} == UTF-32LE = no bom A S C I I a b c d e x y z G e r m a n õ ÷ ³ ─ Í ▄ ▀ P o l i s h ♣☺ ↓☺ z☺ |☺ D☺ B☺ R u s s i a n 0♦ 1♦ 2♦ 3♦ 4♦ 5♦ 6♦ M♦ N ♦ O♦ C J K `O }Y = bom ■ A S C I I a b c d e x y z G e r m a n õ ÷ ³ ─ Í ▄ ▀ P o l i s h ♣☺ ↓☺ z☺ |☺ D☺ B☺ R u s s i a n 0♦ 1♦ 2♦ 3♦ 4♦ 5♦ 6♦ M♦ N ♦ O♦ C J K `O }Y == UTF-32BE = no bom A S C I I a b c d e x y z G e r m a n õ ÷ ³ ─ Í ▄ ▀ P o l i s h ☺♣ ☺↓ ☺z ☺| ☺D ☺B R u s s i a n ♦0 ♦1 ♦2 ♦3 ♦4 ♦5 ♦6 ♦M ♦N ♦O C J K O` Y} = bom ■ A S C I I a b c d e x y z G e r m a n õ ÷ ³ ─ Í ▄ ▀ P o l i s h ☺♣ ☺↓ ☺z ☺| ☺D ☺B R u s s i a n ♦0 ♦1 ♦2 ♦3 ♦4 ♦5 ♦6 ♦M ♦N ♦O C J K O` Y} </code></pre> However, what if we <code>type</code> the files that got saved? They contain the exact same bytes that were printed to the console. <pre><code>Z:\andrew\projects\sx\1259084>type *.txt uc-test-UTF-16BE-bom.txt ■ A S C I I a b c d e x y z G e r m a n õ ÷ ³ ─ Í ▄ ▀ P o l i s h ☺♣☺↓☺z☺|☺D☺B R u s s i a n ♦0♦1♦2♦3♦4♦5♦6 ♦M♦N♦O C J K O`Y} uc-test-UTF-16BE-nobom.txt A S C I I a b c d e x y z G e r m a n õ ÷ ³ ─ Í ▄ ▀ P o l i s h ☺♣☺↓☺z☺|☺D☺B R u s s i a n ♦0♦1♦2♦3♦4♦5♦6 ♦M♦N♦O C J K O`Y} uc-test-UTF-16LE-bom.txt ASCII abcde xyz German äöü ÄÖÜ ß Polish ąęźżńł Russian абвгдеж эюя CJK 你好 uc-test-UTF-16LE-nobom.txt A S C I I a b c d e x y z G e r m a n õ ÷ ³ ─ Í ▄ ▀ P o l i s h ♣☺↓☺z☺|☺D☺B☺ R u s s i a n 0♦1♦2♦3♦4♦5♦6♦ M♦N♦O♦ C J K `O}Y uc-test-UTF-32BE-bom.txt ■ A S C I I a b c d e x y z G e r m a n õ ÷ ³ ─ Í ▄ ▀ P o l i s h ☺♣ ☺↓ ☺z ☺| ☺D ☺B R u s s i a n ♦0 ♦1 ♦2 ♦3 ♦4 ♦5 ♦6 ♦M ♦N ♦O C J K O` Y} uc-test-UTF-32BE-nobom.txt A S C I I a b c d e x y z G e r m a n õ ÷ ³ ─ Í ▄ ▀ P o l i s h ☺♣ ☺↓ ☺z ☺| ☺D ☺B R u s s i a n ♦0 ♦1 ♦2 ♦3 ♦4 ♦5 ♦6 ♦M ♦N ♦O C J K O` Y} uc-test-UTF-32LE-bom.txt A S C I I a b c d e x y z G e r m a n ä ö ü Ä Ö Ü ß P o l i s h ą ę ź ż ń ł R u s s i a n а б в г д е ж э ю я C J K 你好 uc-test-UTF-32LE-nobom.txt A S C I I a b c d e x y z G e r m a n õ ÷ ³ ─ Í ▄ ▀ P o l i s h ♣☺ ↓☺ z☺ |☺ D☺ B☺ R u s s i a n 0♦ 1♦ 2♦ 3♦ 4♦ 5♦ 6♦ M♦ N ♦ O♦ C J K `O }Y uc-test-UTF-8-bom.txt ´╗┐ASCII abcde xyz German ├ñ├Â├╝ ├ä├û├£ ├ƒ Polish ─à─Ö┼║┼╝┼ä┼é Russian ð░ð▒ð▓ð│ð┤ðÁðÂ ÐìÐÄÐÅ CJK õ¢áÕÑ¢ uc-test-UTF-8-nobom.txt ASCII abcde xyz German ├ñ├Â├╝ ├ä├û├£ ├ƒ Polish ─à─Ö┼║┼╝┼ä┼é Russian ð░ð▒ð▓ð│ð┤ðÁðÂ ÐìÐÄÐÅ CJK õ¢áÕÑ¢ </code></pre> The only thing that works is UTF-16LE file, with a BOM, printed to the console via <code>type</code>. If we use anything other than <code>type</code> to print the file, we get garbage: <pre><code>Z:\andrew\projects\sx\1259084>copy uc-test-UTF-16LE-bom.txt CON ■A S C I I a b c d e x y z G e r m a n õ ÷ ³ ─ Í ▄ ▀ P o l i s h ♣☺↓☺z☺|☺D☺B☺ R u s s i a n 0♦1♦2♦3♦4♦5♦6♦ M♦N♦O♦ C J K `O}Y 1 file(s) copied. </code></pre> From the fact that <code>copy CON</code> does not display Unicode correctly, we can conclude that the <code>type</code> command has logic to detect a UTF-16LE BOM at the start of the file, and use special Windows APIs to print it. We can see this by opening <code>cmd.exe</code> in a debugger when it goes to <code>type</code> out a file: <img src="https://i.stack.imgur.com/XcX2H.png" alt="enter image description here"> After <code>type</code> opens a file, it checks for a BOM of <code>0xFEFF</code>—i.e., the bytes <code>0xFF 0xFE</code> in little-endian—and if there is such a BOM, <code>type</code> sets an internal <code>fOutputUnicode</code> flag. This flag is checked later to decide whether to call <code>WriteConsoleW</code>. But that’s the only way to get <code>type</code> to output Unicode, and only for files that have BOMs and are in UTF-16LE. For all other files, and for programs that don’t have special code to handle console output, your files will be interpreted according to the current codepage, and will likely show up as gibberish. You can emulate how <code>type</code> outputs Unicode to the console in your own programs like so: <pre class="lang-c prettyprint-override"><code>#include <stdio.h> #define UNICODE #include <windows.h> static LPCSTR lpcsTest = "ASCII abcde xyz\n" "German äöü ÄÖÜ ß\n" "Polish ąęźżńł\n" "Russian абвгдеж эюя\n" "CJK 你好\n"; int main() { int n; wchar_t buf[1024]; HANDLE hConsole = GetStdHandle(STD_OUTPUT_HANDLE); n = MultiByteToWideChar(CP_UTF8, 0, lpcsTest, strlen(lpcsTest), buf, sizeof(buf)); WriteConsole(hConsole, buf, n, &n, NULL); return 0; } </code></pre> This program works for printing Unicode on the Windows console using the default codepage. <hr> For the sample Java program, we can get a little bit of correct output by setting the codepage manually, though the output gets messed up in weird ways: <pre><code>Z:\andrew\projects\sx\1259084>chcp 65001 Active code page: 65001 Z:\andrew\projects\sx\1259084>java Foo == UTF-8 = no bom ASCII abcde xyz German äöü ÄÖÜ ß Polish ąęźżńł Russian абвгдеж эюя CJK 你好 ж эюя CJK 你好你好好 � = bom ASCII abcde xyz German äöü ÄÖÜ ß Polish ąęźżńł Russian абвгдеж эюя CJK 你好 еж эюя CJK 你好你好好 � == UTF-16LE = no bom A S C I I a b c d e x y z … </code></pre> However, a C program that sets a Unicode UTF-8 codepage: <pre class="lang-c prettyprint-override"><code>#include <stdio.h> #include <windows.h> int main() { int c, n; UINT oldCodePage; char buf[1024]; oldCodePage = GetConsoleOutputCP(); if (!SetConsoleOutputCP(65001)) { printf("error\n"); } freopen("uc-test-UTF-8-nobom.txt", "rb", stdin); n = fread(buf, sizeof(buf[0]), sizeof(buf), stdin); fwrite(buf, sizeof(buf[0]), n, stdout); SetConsoleOutputCP(oldCodePage); return 0; } </code></pre> does have correct output: <pre><code>Z:\andrew\projects\sx\1259084>.\test ASCII abcde xyz German äöü ÄÖÜ ß Polish ąęźżńł Russian абвгдеж эюя CJK 你好 </code></pre> <hr> The moral of the story? <ul> <li><code>type</code> can print UTF-16LE files with a BOM regardless of your current codepage</li> <li>Win32 programs can be programmed to output Unicode to the console, using <code>WriteConsoleW</code>.</li> <li>Other programs which set the codepage and adjust their output encoding accordingly can print Unicode on the console regardless of what the codepage was when the program started</li> <li>For everything else you will have to mess around with <code>chcp</code>, and will probably still get weird output.</li> </ul>
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POWhat encoding/code page is cmd.exe using?
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USandrewdotn
UserOwnerUserId
1. USandrewdotn
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. POWhat encoding/code page is cmd.exe using?
 singulars
 PostTypePostTypeId
 PTQuestion
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId
1. COWhoa, this must be the most detailed answer I've ever seen on SO. Extra credit for the dissasembly prints and multilanguage skillz! Just beautiful, sir!
 singulars
 PostPostId
 PO
 UserUserId
 USAndy Terra

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.