Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>Right; the problem is that this isn't <em>just</em> protobuf - it is a hybrid file format (<a href="http://wiki.openstreetmap.org/wiki/ProtocolBufBinary" rel="noreferrer">defined here</a> that <em>includes</em> protobuf among various formats internally. It also incorporates compression (although that looks to be optional).</p> <p>I've pulled apart what I can from the spec, and I've got a C# reader here that uses protobuf-net to handle the chunks - it happily reads through that file to the end - I can tell you there are 4515 blocks (<code>BlockHeader</code>). When it gets to the <code>Blob</code> I'm a bit confused as to how the spec demarks <code>OSMHeader</code> and <code>OSMData</code> - I'm open to suggestions here! I've also used <a href="http://www.componentace.com/download/download.php?editionid=25" rel="noreferrer">ZLIB.NET</a> to handle the zlib compression that is being used. In the absence of getting my head around this, I've settled for processing the ZLIB data and validating it against the claimed size, to check it is at least sane.</p> <p>If you can figure out (or ask the author) how they are separating <code>OSMHeader</code> and <code>OSMData</code> I'll happily crank something else in. I hope you don't mind that I've stopped here - but it has been a few hours ;p</p> <pre><code>using System; using System.IO; using OpenStreetMap; // where my .proto-generated entities are living using ProtoBuf; // protobuf-net using zlib; // ZLIB.NET class OpenStreetMapParser { static void Main() { using (var file = File.OpenRead("us-northeast.osm.pbf")) { // from http://wiki.openstreetmap.org/wiki/ProtocolBufBinary: //A file contains a header followed by a sequence of fileblocks. The design is intended to allow future random-access to the contents of the file and skipping past not-understood or unwanted data. //The format is a repeating sequence of: //int4: length of the BlockHeader message in network byte order //serialized BlockHeader message //serialized Blob message (size is given in the header) int length, blockCount = 0; while (Serializer.TryReadLengthPrefix(file, PrefixStyle.Fixed32, out length)) { // I'm just being lazy and re-using something "close enough" here // note that v2 has a big-endian option, but Fixed32 assumes little-endian - we // actually need the other way around (network byte order): uint len = (uint)length; len = ((len &amp; 0xFF) &lt;&lt; 24) | ((len &amp; 0xFF00) &lt;&lt; 8) | ((len &amp; 0xFF0000) &gt;&gt; 8) | ((len &amp; 0xFF000000) &gt;&gt; 24); length = (int)len; BlockHeader header; // again, v2 has capped-streams built in, but I'm deliberately // limiting myself to v1 features using (var tmp = new LimitedStream(file, length)) { header = Serializer.Deserialize&lt;BlockHeader&gt;(tmp); } Blob blob; using (var tmp = new LimitedStream(file, header.datasize)) { blob = Serializer.Deserialize&lt;Blob&gt;(tmp); } if(blob.zlib_data == null) throw new NotSupportedException("I'm only handling zlib here!"); using(var ms = new MemoryStream(blob.zlib_data)) using(var zlib = new ZLibStream(ms)) { // at this point I'm very unclear how the OSMHeader and OSMData are packed - it isn't clear // read this to the end, to check we can parse the zlib int payloadLen = 0; while (zlib.ReadByte() &gt;= 0) payloadLen++; if (payloadLen != blob.raw_size) throw new FormatException("Screwed that up..."); } blockCount++; Console.WriteLine("Read block " + blockCount.ToString()); } Console.WriteLine("all done"); Console.ReadLine(); } } } abstract class InputStream : Stream { protected abstract int ReadNextBlock(byte[] buffer, int offset, int count); public sealed override int Read(byte[] buffer, int offset, int count) { int bytesRead, totalRead = 0; while (count &gt; 0 &amp;&amp; (bytesRead = ReadNextBlock(buffer, offset, count)) &gt; 0) { count -= bytesRead; offset += bytesRead; totalRead += bytesRead; pos += bytesRead; } return totalRead; } long pos; public override void Write(byte[] buffer, int offset, int count) { throw new NotImplementedException(); } public override void SetLength(long value) { throw new NotImplementedException(); } public override long Position { get { return pos; } set { if (pos != value) throw new NotImplementedException(); } } public override long Length { get { throw new NotImplementedException(); } } public override void Flush() { throw new NotImplementedException(); } public override bool CanWrite { get { return false; } } public override bool CanRead { get { return true; } } public override bool CanSeek { get { return false; } } public override long Seek(long offset, SeekOrigin origin) { throw new NotImplementedException(); } } class ZLibStream : InputStream { // uses ZLIB.NET: http://www.componentace.com/download/download.php?editionid=25 private ZInputStream reader; // seriously, why isn't this a stream? public ZLibStream(Stream stream) { reader = new ZInputStream(stream); } public override void Close() { reader.Close(); base.Close(); } protected override int ReadNextBlock(byte[] buffer, int offset, int count) { // OMG! reader.Read is the base-stream, reader.read is decompressed! yeuch return reader.read(buffer, offset, count); } } // deliberately doesn't dispose the base-stream class LimitedStream : InputStream { private Stream stream; private long remaining; public LimitedStream(Stream stream, long length) { if (length &lt; 0) throw new ArgumentOutOfRangeException("length"); if (stream == null) throw new ArgumentNullException("stream"); if (!stream.CanRead) throw new ArgumentException("stream"); this.stream = stream; this.remaining = length; } protected override int ReadNextBlock(byte[] buffer, int offset, int count) { if(count &gt; remaining) count = (int)remaining; int bytesRead = stream.Read(buffer, offset, count); if (bytesRead &gt; 0) remaining -= bytesRead; return bytesRead; } } </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload