Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p><a href="https://stackoverflow.com/users/43959/kaii">@Kaii</a> 's answer is almost correct, but there is a bug in it. It fails to handle the characters Unicode of which are from 128 to 255. Here is the revised version(just change 256 to 128):</p> <pre><code>function encode_utf8( s ) { return unescape( encodeURIComponent( s ) ); } function substr_utf8_bytes(str, startInBytes, lengthInBytes) { /* this function scans a multibyte string and returns a substring. * arguments are start position and length, both defined in bytes. * * this is tricky, because javascript only allows character level * and not byte level access on strings. Also, all strings are stored * in utf-16 internally - so we need to convert characters to utf-8 * to detect their length in utf-8 encoding. * * the startInBytes and lengthInBytes parameters are based on byte * positions in a utf-8 encoded string. * in utf-8, for example: * "a" is 1 byte, "ü" is 2 byte, and "你" is 3 byte. * * NOTE: * according to ECMAScript 262 all strings are stored as a sequence * of 16-bit characters. so we need a encode_utf8() function to safely * detect the length our character would have in a utf8 representation. * * http://www.ecma-international.org/publications/files/ecma-st/ECMA-262.pdf * see "4.3.16 String Value": * &gt; Although each value usually represents a single 16-bit unit of * &gt; UTF-16 text, the language does not place any restrictions or * &gt; requirements on the values except that they be 16-bit unsigned * &gt; integers. */ var resultStr = ''; var startInChars = 0; // scan string forward to find index of first character // (convert start position in byte to start position in characters) for (bytePos = 0; bytePos &lt; startInBytes; startInChars++) { // get numeric code of character (is &gt;= 128 for multibyte character) // and increase "bytePos" for each byte of the character sequence ch = str.charCodeAt(startInChars); bytePos += (ch &lt; 128) ? 1 : encode_utf8(str[startInChars]).length; } // now that we have the position of the starting character, // we can built the resulting substring // as we don't know the end position in chars yet, we start with a mix of // chars and bytes. we decrease "end" by the byte count of each selected // character to end up in the right position end = startInChars + lengthInBytes - 1; for (n = startInChars; startInChars &lt;= end; n++) { // get numeric code of character (is &gt;= 128 for multibyte character) // and decrease "end" for each byte of the character sequence ch = str.charCodeAt(n); end -= (ch &lt; 128) ? 1 : encode_utf8(str[n]).length; resultStr += str[n]; } return resultStr; } var orig = 'abc你好吗?©'; alert('res: ' + substr_utf8_bytes(orig, 0, 2)); // alerts: "ab" alert('res: ' + substr_utf8_bytes(orig, 2, 1)); // alerts: "c" alert('res: ' + substr_utf8_bytes(orig, 3, 3)); // alerts: "你" alert('res: ' + substr_utf8_bytes(orig, 6, 6)); // alerts: "好吗" alert('res: ' + substr_utf8_bytes(orig, 15, 2)); // alerts: "©" </code></pre> <p>By the way, it is a bug fix, and it SHOULD be useful for the ones who have the same problem. Why did the reviewers reject my edit suggestion due to change "too much" or "too minor"? <a href="https://stackoverflow.com/users/927844/adam-eberlin">@Adam Eberlin</a> <a href="https://stackoverflow.com/users/904365/kjuly">@Kjuly</a> <a href="https://stackoverflow.com/users/775849/jasonw">@Jasonw</a></p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload