Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Serialization is a touchy topic in C++...</p> <p><strong>Quick question:</strong></p> <ul> <li>Serialization: short-lived structure, one encoder/decoder</li> <li>Messaging: longer life, encoders / decoders in multiple languages</li> </ul> <p>The 2 are useful, and have their use.</p> <p><a href="http://www.boost.org/doc/libs/1_41_0/libs/serialization/doc/index.html" rel="noreferrer">Boost.Serialization</a> is the most recommended library for serialization usually, though the odd choice of <code>operator&amp;</code> which serializes or deserializes depending on the const-ness is really an abuse of operator overloading for me.</p> <p>For messaging, I would rather suggest <a href="http://code.google.com/p/protobuf/" rel="noreferrer">Google Protocol Buffer</a>. They offer a clean syntax for describing the message and generate encoders and decoders for a huge variety of languages. There are also one other advantage when performance matters: it allows lazy deserialization (ie only part of the blob at once) by design.</p> <p><strong>Moving on</strong></p> <p>Now, as for the details of implementation, it really depends on what you wish.</p> <ul> <li>You need <strong>versioning</strong>, even for regular serialization, you'll probably need backward compatibility with the previous version anyway.</li> <li>You may, or may not, need a system of <code>tag</code> + <code>factory</code>. It's only necessary for polymorphic class. And you will need one <code>factory</code> per inheritance tree (<code>kind</code>) then... the code can be templatized of course!</li> <li>Pointers / References are going to bite you in the ass... they reference a position in memory that changes after deserialization. I usually choose a tangent approach: each object of each <code>kind</code> is given an <code>id</code>, unique for its <code>kind</code>, and so I serialize the <code>id</code> rather than a pointer. Some framework handles it as long as you don't have circular dependency and serialize the objects pointed to / referenced first.</li> </ul> <p>Personally, I tried as much as I can to separate the code of serialization / deserialization from the actual code that runs the class. Especially, I try to isolate it in the source files so that changes on this part of the code does not annihilate the binary compatibility.</p> <p><em>On versioning</em></p> <p>I usually try to keep serialization and deserialization of one version close together. It's easier to check that they are truly symmetric. I also try to abstract the versioning handling directly in my serialization framework + a few other things, because DRY should be adhered to :)</p> <p><em>On error-handling</em></p> <p>To ease error-detection, I usually use a pair of 'markers' (special bytes) to separate one object from another. It allows me to immediately throw during deserialization because I can detect a problem of desynchronization of the stream (ie, somewhat ate too much bytes or did not ate sufficiently).</p> <p>If you want permissive deserialization, ie deserializing the rest of the stream even if something failed before, you'll have to move toward byte-count: each object is preceded by its byte-count and can only eat so much byte (and is expected to eat them all). This approach is nice because it allows for partial deserialization: ie you can save the part of the stream required for an object and only deserialize it if necessary.</p> <p>Tagging (your class IDs) is useful here, not (only) for dispatching, but simply to check that you are actually deserializing the right type of object. It also allows for pretty error messages.</p> <p>Here are some error messages / exceptions you may wish:</p> <ul> <li><code>No version X for object TYPE: only Y and Z</code></li> <li><code>Stream is corrupted: here are the next few bytes BBBBBBBBBBBBBBBBBBB</code></li> <li><code>TYPE (version X) was not completely deserialized</code></li> <li><code>Trying to deserialize a TYPE1 in TYPE2</code></li> </ul> <p>Note that as far as I remember both <code>Boost.Serialization</code> and <code>protobuf</code> really help for error/version handling.</p> <p><code>protobuf</code> has some perks too, because of its capacity of nesting messages:</p> <ul> <li>the byte-count is naturally supported, as well as the versioning</li> <li>you can do lazy deserialization (ie, store the message and only deserialize if someone asks for it)</li> </ul> <p>The counterpart is that it's harder to handle polymorphism because of the fixed format of the message. You have to carefully design them for that.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload