Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>When coming from a SQL environment it is easy to treat a bucket as a table and store small individual records there, often relying on secondary indexes to get the data out. As Riak is a key-value store that uses consistent hashing, this is however often not the most efficient or scalable approach.</p> <p>A lookup based on key in Riak allows the partitions holding the data to be directly identified, and the coordinating node can directly query these partitions. When querying a <a href="http://docs.basho.com/riak/latest/dev/advanced/2i/" rel="nofollow">secondary index</a>, Riak does not know on which partitions data that may match the index will reside. It will therefore need to send the query to a large number of partitions in order to ensure that all matching objects can be found. This is known as a 'coverage query' and means that, assuming n_val of 3 is used for the bucket, at least 1/3 of all partitions need to be queried. This generally leads to higher load on the cluster and does not scale as well as direct key lookups. Latencies also tend to be higher.</p> <p>When using Riak it is therefore often recommended that you structure your data so that you can use direct key lookups as much as possible, e.g. through de-normalization.</p> <p>If your messages/posts can be grouped some way, e.g. by user or conversation, it may make sense to store them in a single object representing this grouping instead of as separate objects.</p> <p>If we assume that your posts can consist of either text or images and are linked to a conversation thread, you could create an object representing the conversation thread. This would contain information about the conversation as well as a list of posts. This list of posts can e.g. contain the id of the poster, a timestamp and the key of the record containing the post. If the post is a reasonably short text message it may even contain the entire post, reducing the number of records that will need to be fetched.</p> <p>As posts come in to this conversation, the record is updated and the list of posts gets longer. It may be wise to set <code>allow_mult</code> to true in order to enable siblings, as this will allow you to handle concurrent writes. This approach allows you to always get the conversation as well as the latest posts through a single direct key lookup. </p> <p>Riak works best when the size of objects are kept below a couple of MB. You will therefore need to move the oldest posts off to a separate object at some point to keep the size in check. If you keep a list of these related objects in the main conversation object, possibly together with some information about the time interval they cover, you can easily access these through direct key lookup as well if you should need to scroll back over older posts. </p> <p>As the most common query usually is for the most recent entries, this can always be fulfilled through the main conversation object.</p> <p>I would also like to point out that we do have a very active <a href="http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com" rel="nofollow">mailing list</a> where these kind of issues are discussed quite frequently.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
    1. COThank you so much for the extensive reply. You just stopped me from using secondary indexes for any kind of has_many/one relation. In another thread on SO, it was suggested to use a linked list approach, using the 'Link' header to mimic the previous/next chain, and to have a 'first' link on the user, or conversation. This would require N lookups for N posts. When N is small (< 10), are timings acceptable for a web application? The only alternative is your grouping, but *allow_mult* scares me. I'd need to merge the two sibilings. Keeping the posts isolated would prevent double writes (99.999%).
      singulars
    2. COSecondary indexes have many uses, including being used to indicate the parent object in one-to-many relations. The important thing is to design the model so that this is not the primary access method, especially if you have an application with a high read to write ratio. I would also recommend not trying to maintain a linked list using links or references in the object as this easily can break if you have concurrent updates/inserts and/or network partitions.
      singulars
    3. COEnabling siblings does not have to be scary. In this case you could treat the list of posts as a set. If you encounter siblings you perform a set union between the available sets, then sort them by timestamp before writing it back to Riak. If you however have a very high posting frequency resulting in frequent updates, it can be difficult to resolve the siblings properly. In this case it is often recommended to try and funnel all the writes/updates through a single (or small number of) threads to reduce the risk of sibling explosion.
      singulars
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload