Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>dan wilkerson, simon goldsmith, et al. designed a thorough <a href="http://arxiv.org/abs/1106.3325" rel="nofollow">global transaction system</a> on top of app engine's local (per entity group) transactions. at a high level, it uses techniques similar to the GUID one you describe. dan dealt with "submarine writes," ie the transactions you describe that report failure but later surface as succeeded, as well as many other theoretical and practical details of the datastore. erick armbrust implemented dan's design in <a href="http://code.google.com/p/tapioca-orm/" rel="nofollow">tapioca-orm</a>.</p> <p>i don't necessarily recommend that you implement his design or use tapioca-orm, but you'd definitely be interested in the research.</p> <p>in response to your questions: plenty of people implement GAE apps that use the datastore without idempotency. it's only important when you need transactions with certain kinds of guarantees like the ones you describe. it's definitely important to understand when you do need them, but you often don't.</p> <p>the datastore is implemented on top of megastore, which is described in depth <a href="http://research.google.com/pubs/pub36971.html" rel="nofollow">in this paper</a>. in short, it uses <a href="http://en.wikipedia.org/wiki/Multiversion_concurrency_control" rel="nofollow">multi-version concurrency control</a> within each entity group and <a href="http://en.wikipedia.org/wiki/Paxos_%28computer_science%29" rel="nofollow">Paxos</a> for replication across datacenters, both of which can contribute to submarine writes. i don't know if there are public numbers on submarine write frequency in the datastore, but if there are, searches with these terms and on the datastore mailing lists should find them.</p> <p>amazon's S3 isn't really a comparable system; it's more of a CDN than a distributed database. amazon's SimpleDB is comparable. it originally only provided <a href="http://aws.amazon.com/simpledb/#eventually-consistent" rel="nofollow">eventual consistency</a>, and eventually added a very limited kind of transactions they call <a href="http://aws.amazon.com/simpledb/#consistent" rel="nofollow">conditional writes</a>, but it doesn't have true transactions. other NoSQL databases (redis, mongo, couchdb, etc.) have different variations on transactions and consistency.</p> <p>basically, there's always a tradeoff in distributed databases between scale, transaction breadth, and strength of consistency guarantees. this is best known by eric brewer's <a href="http://en.wikipedia.org/wiki/CAP_theorem" rel="nofollow">CAP theorem</a>, which says the three axes of the tradeoff are consistency, availability, and partition tolerance.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
    1. COSo what is interesting is that the paper talks about a submarine write being a situation where a write happens, but reads return stale data. To me that seems like less of a concern. The more problematic issue is that the App Engine documentation indicates that in the case of a submarine write, an exception is thrown, making it so that the client will think it has to retry.
      singulars
    2. COanother thing that is interesting is that this paper seems to contradict what Guido van Rossum says in the link above -- submarine writes in particular seem to be specific to app engine, and he says specifically it is an optimization decision made by the app engine team. So, in general, the CAP theorem sure, but submarine writes are specifically app engine's issue. He also makes the important point that transaction order is never compromised, so if redoing a transaction would cause an error, you shouldn't have a problem (seems like a shortcut around true idempotency).
      singulars
    3. COTo clarify the second comment: let's say you're creating a record and the record's key is fully determined by information sent by the client (for example, a registration page would send a username which becomes the key). Assume a submarine write and a spurious error of type "retry" (such as ConcurrentModificationException). The client retries, the record already exists, a different error is thrown which is NOT of type retry, and the user sees an error but is in fact registered. Not the most user-friendly result but at least your data isn't corrupted, and submarines are rare, right?
      singulars
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload