StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>Knowing what you need to query is one of, if not the single most, important consideration when planning a Cassandra schema. Cassandra is designed to handle a high volume of writes. </p> <p>Depending on what other features you want to implement, you may need to create additional column families or replace these entirely to store the data in the manner most optimized for cassandra to query.</p> <p>Further, I recommend storing as much of the data in natively in cassandra as is possible. I wouldn't load a JSON object into a fat string column. At the very least, store pertinent data directly in cassandra, such as values of likes, etc.</p> <p>You have two domain models, User and Song, and three classes of data you want to store:</p> <ol> <li>Comments</li> <li>Likes/Dislikes/Favorites</li> <li>Views/Downloads</li> </ol> <p>And you detail some functional queries needed for your update algorithm:</p> <ol> <li>Check if user already liked that item</li> <li>Check if user disliked that item before</li> <li>If he did dislike, then remove that entry</li> <li>Now get current likes</li> <li>And now update item itself, set new likes count <ul> <li>using cassandra counters, both these steps can happen at once</li> </ul></li> <li>Update CF which contains all user who liked that item</li> </ol> <p>A schema that can satisfy those query requirements follows. </p> <p>First, we'll define the CFs for <strong>User and Song</strong>, both of which use UUID as keys.</p> <pre><code>create column family users with comparator=UTF8Type and column_metadata=[{column_name: user_name, validation_class: UTF8Type, index_type: KEYS}, {column_name: json_data, validation_class: UTF8Type}]; create column family songs with comparator=UTF8Type and column_metadata=[{column_name: user_name, validation_class: UTF8Type, index_type: KEYS}, {column_name: json_data, validation_class: UTF8Type}]; </code></pre> <p>The secondary index helps to retrieve the user row by username. See <a href="http://www.datastax.com/docs/1.0/ddl/indexes" rel="nofollow">this</a> for performance considerations.</p> <hr> <p><strong>Comments</strong> could be modeled using a UUID for each comment as key as follows:</p> <pre><code>create column family comments with comparator = 'UTF8Type' and column_metadata=[{column_name: user, validation_class: UUIDType, index_type: KEYS}, {column_name: song, validation_class: UUIDType, index_type: KEYS}, {column_name: timestamp, validation_class: DateType}, {column_name: comment, validation_class: UTF8Type}]; </code></pre> <hr> <p>Since <strong>user's likes and dislikes</strong> are mutually exclusive, we can use one column family to store all the user's song likes/dislikes. If you constrain Favorite to imply like, we can do all three with just this one. Using the user's UUID as the row key, and song's UUID as the column key, a column value of 0 => no value, 1 => dislike, 2 => like, 3 => favorite.</p> <pre><code>create column family user_likes with comparator = 'UUIDType' and default_validation_class = IntegerType; </code></pre> <hr> <p>All that's left now is tracking the total <strong>likes, dislikes, favorites, views, and downloads</strong> per song. We can use <a href="http://wiki.apache.org/cassandra/Counters" rel="nofollow">Cassandra's counter column</a> type to accomplish this in one CF. Use the song's UUID as the CF key.</p> <pre><code>create column family song_data with default_validation_class=CounterColumnType and column_metadata=[{column_name: likes}, {column_name: dislikes}, {column_name: favorites}, {column_name: views}, {column_name: downloads}]; </code></pre>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload