StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>A deflate stream works on binary data. An arbitrary binary chunk in the middle of a text file is also known as: a corrupt text file. There is no sane way of decoding this:</p> <ul> <li>you can't read "lines", because there is no definition of a "line" when talking about binary data; any combination of CR/LF/CRLF/etc could occur completely by random in the binary data</li> <li>you can't read a "string line", because that suggests you are running the data through an <code>Encoding</code>; but since this <em>isn't text data</em>, again: that will simply give you gibberish that cannot be processed (it will have lost data when reading)</li> </ul> <p>Now, the second of these two problems is solvable by reading via the <code>Stream</code> API rather than the <code>StreamReader</code> API, so that you are only ever reading <em>binary</em>; you would then need to look for the line endings yourself, using an <code>Encoding</code> to probe what you can (noting that this isn't as simple as it sounds if you are using multi/variable-byte encodings such as UTF-8).</p> <p>However, the first of these two problems is inherently <em>not solvable</em> by itself. To do this reliably, you would need some kind of binary framing protocol - which again, does not exist in a text file. It looks like the example is using "mark" and "endmark" - again, there is technically a chance that these would occur at random, but you'll <em>probably</em> get away with it for the 99.999% case. The trick, then, would be to read the entire file manually using <code>Stream</code> and <code>Encoding</code>, looking for "mark" and "endmark" - and stripping the bits that are encoded as text from the bits that are compressed data. Then run the encoded-as-text piece through the correct <code>Encoding</code>.</p> <p>However! At the point when you are reading binary, then it is simple: you simply buffer the right amount (using whatever framing/sentinel protocol the data is written in), and use something like:</p> <pre><code>using(var ms = new MemoryStream(bytes)) using(var inflate = new GZipStream(ms, CompressionMode.Decompress)) { // now read from 'inflate' } </code></pre> <hr> <p>With the addition of the <code>l 73</code> marker, and the information that it is ASCII, it becomes a little more viable.</p> <p>This won't work <em>for me</em> because the data here on SO is already corrupted (posting binary as text does that), but <em>basically</em> something like:</p> <pre><code>using System; using System.Collections.Generic; using System.IO; using System.IO.Compression; using System.Text; using System.Text.RegularExpressions; class Program { static void Main() { using (var file = File.OpenRead("my.txt")) using (var buffer = new MemoryStream()) { List<string> lines = new List<string>(); string line; while ((line = ReadToCRLF(file, buffer)) != null) { lines.Add(line); Console.WriteLine(line); if (line == "mark" && lines.Count >= 2) { var match = Regex.Match(lines[lines.Count - 2], "^l ([0-9]+)$"); int bytes; if (match.Success && int.TryParse(match.Groups[1].Value, out bytes)) { ReadBytes(file, buffer, bytes); string inflated = Inflate(buffer); lines.Add(inflated); // or something similar Console.WriteLine(inflated); } } } } } static string Inflate(Stream source) { using (var deflate = new DeflateStream(source, CompressionMode.Decompress, true)) using (var reader = new StreamReader(deflate, Encoding.ASCII)) { return reader.ReadToEnd(); } } static void ReadBytes(Stream source, MemoryStream buffer, int count) { buffer.SetLength(count); int read, offset = 0; while (count > 0 && (read = source.Read(buffer.GetBuffer(), offset, count)) > 0) { count -= read; offset += read; } if (count != 0) throw new EndOfStreamException(); buffer.Position = 0; } static string ReadToCRLF(Stream source, MemoryStream buffer) { buffer.SetLength(0); int next; bool wasCr = false; while ((next = source.ReadByte()) >= 0) { if(next == 10 && wasCr) { // CRLF // end of line (minus the CR) return Encoding.ASCII.GetString( buffer.GetBuffer(), 0, (int)buffer.Length - 1); } buffer.WriteByte((byte)next); wasCr = next == 13; } // end of file if (buffer.Length == 0) return null; return Encoding.ASCII.GetString(buffer.GetBuffer(), 0, (int)buffer.Length); } } </code></pre>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload