Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>My initial approach could by summarized this way:</p> <p>1) Reverse bytes naively</p> <p>2) Run the string backwards and fix the utf8 sequences as you go. </p> <p>Illegal sequences are dealt with in the second step and in the first step, we check if the string is in "sync" (that is, if it starts with a legal leading byte).</p> <p>EDIT: improved validation for leading byte in Reverse()</p> <pre><code>class UTF8Utils { public static void Reverse(byte[] str) { int len = str.Length; int i = 0; int j = len - 1; // first, check if the string is "synced", i.e., it starts // with a valid leading character. Will check for illegal // sequences thru the whole string later. byte leadChar = str[0]; // if it starts with 10xx xxx, it's a trailing char... // if it starts with 1111 10xx or 1111 110x // it's out of the 4 bytes range. // EDIT: added validation for 7 bytes seq and 0xff if( (leadChar &amp; 0xc0) == 0x80 || (leadChar &amp; 0xfc) == 0xf8 || (leadChar &amp; 0xfe) == 0xfc || (leadChar &amp; 0xff) == 0xfe || leadChar == 0xff) { throw new Exception("Illegal UTF-8 sequence"); } // reverse bytes in-place naïvely while(i &lt; j) { byte tmp = str[i]; str[i] = str[j]; str[j] = tmp; i++; j--; } // now, run the string again to fix the multibyte sequences UTF8Utils.ReverseMbSequences(str); } private static void ReverseMbSequences(byte[] str) { int i = str.Length - 1; byte leadChar = 0; int nBytes = 0; // loop backwards thru the reversed buffer while(i &gt;= 0) { // since the first byte in the unreversed buffer is assumed to be // the leading char of that byte, it seems safe to assume that the // last byte is now the leading char. (Given that the string is // not out of sync -- we checked that out already) leadChar = str[i]; // check how many bytes this sequence takes and validate against // illegal sequences if(leadChar &lt; 0x80) { nBytes = 1; } else if((leadChar &amp; 0xe0) == 0xc0) { if((str[i-1] &amp; 0xc0) != 0x80) { throw new Exception("Illegal UTF-8 sequence"); } nBytes = 2; } else if ((leadChar &amp; 0xf0) == 0xe0) { if((str[i-1] &amp; 0xc0) != 0x80 || (str[i-2] &amp; 0xc0) != 0x80 ) { throw new Exception("Illegal UTF-8 sequence"); } nBytes = 3; } else if ((leadChar &amp; 0xf8) == 0xf0) { if((str[i-1] &amp; 0xc0) != 0x80 || (str[i-2] &amp; 0xc0) != 0x80 || (str[i-3] &amp; 0xc0) != 0x80 ) { throw new Exception("Illegal UTF-8 sequence"); } nBytes = 4; } else { throw new Exception("Illegal UTF-8 sequence"); } // now, reverse the current sequence and then continue // whith the next one int back = i; int front = back - nBytes + 1; while(front &lt; back) { byte tmp = str[front]; str[front] = str[back]; str[back] = tmp; front++; back--; } i -= nBytes; } } } </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload