Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>(I'm assuming you're on Windows, since using U+FEFF as a signature in UTF-8 files is mostly a Windows thing and should simply be avoided elsewhere)</p> <p>You could open the file as a UTF-8 file and then check to see if the first character is U+FEFF. You can do this by opening a normal char based fstream and then use wbuffer_convert to treat it as a series of code units in another encoding. VS2010 doesn't yet have great support for char32_t so the following uses UTF-16 in wchar_t.</p> <pre><code>std::fstream fs(filename); std::wbuffer_convert&lt;std::codecvt_utf8_utf16&lt;wchar_t&gt;,wchar_t&gt; wb(fs.rdbuf()); std::wistream is(&amp;wb); // if you don't do this on the stack remember to destroy the objects in reverse order of creation. is, then wb, then fs. std::wistream::int_type ch = is.get(); const std::wistream::int_type ZERO_WIDTH_NO_BREAK_SPACE = 0xFEFF if(ZERO_WIDTH_NO_BREAK_SPACE != ch) is.putback(ch); // now the stream can be passed around and used without worrying about the extra character in the stream. int i; readFromStream&lt;int&gt;(is,i); </code></pre> <p>Remember that this should be done on the file stream as a whole, not inside readFromFile on your stringstream, because ignoring U+FEFF should only be done if it's the very first character in the whole file, if at all. It shouldn't be done anywhere else.</p> <p>On the other hand, if you're happy using a char based stream and just want to skip U+FEFF if present then James Kanze suggestion seems good so here's an implementation:</p> <pre><code>std::fstream fs(filename); char a,b,c; a = fs.get(); b = fs.get(); c = fs.get(); if (a != (char)0xEF || b != (char)0xBB || c != (char)0xBF) { fs.seekg(0); } else { std::cerr &lt;&lt; "Warning: file contains the so-called 'UTF-8 signature'\n"; } </code></pre> <hr> <p>Additionally if you want to use <code>wchar_t</code> internally the <code>codecvt_utf8_utf16</code> and <code>codecvt_utf8</code> facets have a mode that can consume 'BOMs' for you. The only problem is that <code>wchar_t</code> is widely recognized to be worthless these days* and so you probably shouldn't do this.</p> <pre><code>std::wifstream fin(filename); fin.imbue(std::locale(fin.getloc(), new std::codecvt_utf8_utf16&lt;wchar_t, 0x10FFFF, std::consume_header)); </code></pre> <p><sub>* <code>wchar_t</code> is worthless because it is specified to do just one thing; provide a fixed size data type that can represent any code point in a locale's character repertoire. It does not provide a common representation <em>between</em> locales (i.e., the same <code>wchar_t</code> value can be different characters in different locales so you cannot necessarily convert to <code>wchar_t</code>, switch to another locale, and then convert back to <code>char</code> in order to do <code>iconv</code>-like encoding conversions.)</sub></p> <p><sub>The fixed sized representation itself is worthless for two reasons; first, many code points have semantic meanings and so understanding text means you have to process multiple code points anyway. Secondly, some platforms such as Windows use UTF-16 as the <code>wchar_t</code> encoding, which means a single <code>wchar_t</code> isn't even necessarily a code point value. (Whether using UTF-16 this way is even conformant to the standard is ambiguous. The standard requires that every character supported by a locale be representable as a single <code>wchar_t</code> value; If no locale supports any character outside the BMP then UTF-16 could be seen as conformant.)</sub></p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload