Note that there are some explanatory texts on larger screens.

plurals
  1. POUTF-8, CString and CFile? (C++, MFC)
    primarykey
    data
    text
    <p>I'm currently working on a MFC program that specifically has to work with UTF-8. At some point, I have to write UTF-8 data into a file; to do that, I'm using CFiles and CStrings.</p> <p>When I get to write utf-8 (russian characters, to be more precise) data into a file, the output looks like </p> <pre><code>Ðàñïå÷àòàíî: Ñèñòåìà Ïðîèçâîäñòâî </code></pre> <p>and etc. This is assurely not utf-8. To read this data properly, I have to change my system settings; changing non ASCII characters to a russian encoding table does work, but then all my latin based non-ascii characters get to fail. Anyway, that's how I do it.</p> <pre><code>CFile CSVFile( m_sCible, CFile::modeCreate|CFile::modeWrite); CString sWorkingLine; //Add stuff into sWorkingline CSVFile.Write(sWorkingLine,sWorkingLine.GetLength()); //Clean sWorkingline and start over </code></pre> <p>Am I missing something? Shall I use something else instead? Is there some kind of catch I've missed? I'll be tuned in for your wisdom and experience, fellow programmers.</p> <p>EDIT: Of course, as I just asked a question, I finally find something which might be interesting, that can be found <a href="http://blog.kalmbachnet.de/?postid=105" rel="noreferrer">here</a>. Thought I might share it.</p> <p>EDIT 2:</p> <p>Okay, so I added the BOM to my file, which now contains chineese character, probably because I didn't convert my line to UTF-8. To add the bom I did...</p> <pre><code>char BOM[3]={0xEF, 0xBB, 0xBF}; CSVFile.Write(BOM,3); </code></pre> <p>And after that, I added...</p> <pre><code> TCHAR TestLine; //Convert the line to UTF-8 multibyte. WideCharToMultiByte (CP_UTF8,0,sWorkingLine,sWorkingLine.GetLength(),TestLine,strlen(TestLine)+1,NULL,NULL); //Add the line to file. CSVFile.Write(TestLine,strlen(TestLine)+1); </code></pre> <p><em>But then I cannot compile, as I don't really know how to get the length of TestLine. strlen doesn't seem to accept TCHAR.</em> Fixed, used a static lenght of 1000 instead.</p> <p>EDIT 3:</p> <p>So, I added this code...</p> <pre><code> wchar_t NewLine[1000]; wcscpy( NewLine, CT2CW( (LPCTSTR) sWorkingLine )); TCHAR* TCHARBuf = new TCHAR[1000]; //Convert the line to UTF-8 multibyte. WideCharToMultiByte (CP_UTF8,0,NewLine,1000,TCHARBuf,1000,NULL,NULL); //Find how many characters we have to add size_t size = 0; HRESULT hr = StringCchLength(TCHARBuf, MAX_PATH, &amp;size); //Add the line to the file CSVFile.Write(TCHARBuf,size); </code></pre> <p>It compiles fine, but when I go look at my new file, it's exactly the same as when I didn't have all this new code (ex : Ðàñïå÷àòàíî:). It feels like I didn't do a step forward, although I guess only a small thing is what separates me from victory.</p> <p>EDIT 4: </p> <p>I removed previously added code, as Nate asked, and I decided to use his code instead, meaning that now, when I get to add my line, I have...</p> <pre><code> CT2CA outputString(sWorkingLine, CP_UTF8); //Add line to file. CSVFile.Write(outputString,::strlen(outputString)); </code></pre> <p>Everything compiles fine, but the russian characters are shown as ???????. Getting closer, but still not that. Btw, I'd like to thank everyone who tried/tries to help me, it is MUCH appreciated. I've been stuck on this for a while now, I can't wait for this problem to be gone.</p> <p>FINAL EDIT (I hope) By changing the way I first got my UTF-8 characters (I reencoded without really knowing), which was erroneous with my new way of outputting the text, I got acceptable results. By adding the UTF-8 BOM char at the beginning of my file, it could be read as Unicode in other programs, like Excel.</p> <p>Hurray! Thank you everyone!</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload