Note that there are some explanatory texts on larger screens.

plurals
  1. POReading stream line by line without knowing its encoding
    primarykey
    data
    text
    <p>I have a situation where I need to process some data from a stream line by line. The problem is that the encoding of the data is not known in advance; it might be <code>UTF-8</code> or any legacy single-byte encoding (e.g. <code>Latin1</code>, <code>ISO-8859-5</code>, etc). It will <em>not</em> be <code>UTF16</code> or exotics like <code>EBCDIC</code>, so I can reasonably expect <code>\n</code> to be unambiguous, so in theory I can split it into lines. At some point, when I encounter an empty line, I will need to feed the rest of the stream somewhere else (without splitting it into lines, but still without any reencoding); think in terms of HTTP-style headers followed by an opaque body.</p> <p>Here is what I got:</p> <pre><code>function processStream(stream) { var buffer = ''; function splitLines(data) { buffer += data; var lf = buffer.indexOf('\n'); while (lf &gt;= 0) { var line = buffer.substr(0, lf - 1); buffer = buffer.substr(lf + 1); this.emit('line', line); lf = buffer.indexOf('\n'); } } function processHeader(line) { if (line.length) { // do something with the line } else { // end of headers, stop splitting lines and start processing the body this .removeListener('data', splitLines) .removeAllListeners('line') .on('data', processBody); if (buffer.length) { // process leftover buffer as part of the body processBody(buffer); buffer = ''; } } } function processBody(data) { // do something with the body chunks } stream.setEncoding('binary'); stream .on('data', splitLines) .on('line', processHeader); } </code></pre> <p>It does the job, but the problem is that the <code>binary</code> encoding is deprecated and will probably disappear in the future, leaving me without that option. All other <code>Buffer</code> encodings will either mangle the data or fail to decode it altogether if (most likely, when) it does not match the encoding. Working with <code>Uint8Array</code> instead will mean slow and inconvenient Javascript loops over the data just to find a newline.</p> <p>Any suggestions on how to split a stream into lines on the fly, while remaining encoding-agnostic <em>without</em> using <code>binary</code> encoding?</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload