Note that there are some explanatory texts on larger screens.

plurals
  1. POExtract substring by utf-8 byte positions
    primarykey
    data
    text
    <p>I have a string and start and length with which to extract a substring. Both positions (start and length) are based on the byte offsets in the original UTF8 string.</p> <p>However, there is a problem: </p> <p>The start and length are in bytes, so I cannot use "substring". The UTF8 string contains several multi-byte characters. Is there a hyper-efficient way of doing this? (I don't need to decode the bytes...)</p> <p>Example: var orig = '你好吗?'</p> <p>The s,e might be 3,3 to extract the second character (好). I'm looking for</p> <pre><code>var result = orig.substringBytes(3,3); </code></pre> <p>Help!</p> <p><strong>Update #1</strong> In C/C++ I would just cast it to a byte array, but not sure if there is an equivalent in javascript. BTW, yes we could parse it into a byte array and parse it back to a string, but it seems that there should be a quick way to cut it at the right place. Imagine that 'orig' is 1000000 characters, and s = 6 bytes and l = 3 bytes.</p> <p><strong>Update #2</strong> Thanks to zerkms helpful re-direction, I ended up with the following, which does <strong>NOT</strong> work right - works right for multibyte but messed up for single byte.</p> <pre><code>function substrBytes(str, start, length) { var ch, startIx = 0, endIx = 0, re = ''; for (var i = 0; 0 &lt; str.length; i++) { startIx = endIx++; ch = str.charCodeAt(i); do { ch = ch &gt;&gt; 8; // a better way may exist to measure ch len endIx++; } while (ch); if (endIx &gt; start + length) { return re; } else if (startIx &gt;= start) { re += str[i]; } } } </code></pre> <p><strong>Update #3</strong> I don't think shifting the char code really works. I'm reading two bytes when the correct answer is three... somehow I always forget this. The codepoint is the same for UTF8 and UTF16, but the number of bytes taken up on encoding depends on the encoding!!! So this is not the right way to do this.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload