StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POHow should I implement simple caches with concurrency on Redis?
primarykey
Id
19756654
data
AcceptedAnswerId
19758606
AnswerCount
1
ClosedDate
CommentCount
0
CommunityOwnedDate
CreationDate
2013-11-03T18:50:52.630
FavoriteCount
0
LastActivityDate
2013-11-03T21:51:48.520
LastEditDate
LastEditorUserId
0
OwnerUserId
251199
ParentId
0
PostTypeId
1
Score
2
ViewCount
739
LastEditorDisplayName
text
Body
<h1>Background</h1> I have a 2-tier web service - just my app server and an RDBMS. I want to move to a pool of identical app servers behind a load balancer. I currently cache a bunch of objects in-process. I hope to move them to a shared Redis. I have a dozen or so caches of simple, small-sized business objects. For example, I have a set of <code>Foos</code>. Each <code>Foo</code> has a unique <code>FooId</code> and an <code>OwnerId</code>. One "owner" may own multiple <code>Foos</code>. In a traditional RDBMS this is just a table with an index on the PK FooId and one on OwnerId. I'm caching this in one process simply: <pre><code>Dictionary<int,Foo> _cacheFooById; Dictionary<int,HashSet<int>> _indexFooIdsByOwnerId; </code></pre> Reads come straight from here, and writes go here and to the RDBMS. I usually have this invariant: "For a given group [say by OwnerId], the whole group is in cache or none of it is." So when I cache miss on a Foo, I pull that Foo and all the owner's other Foos from the RDBMS. Updates make sure to keep the index up to date and respect the invariant. When an owner calls GetMyFoos I never have to worry that some are cached and some aren't. <h1>What I did already</h1> The first/simplest answer seems to be to use plain ol' <code>SET</code> and <code>GET</code> with a composite key and json value: <pre><code>SET( "ServiceCache:Foo:" + theFoo.Id, JsonSerialize(theFoo)); </code></pre> I later decided I liked: <pre><code>HSET( "ServiceCache:Foo", theFoo.FooId, JsonSerialize(theFoo)); </code></pre> That lets me get all the values in one cache as HVALS. It also felt right - I'm literally moving hashtables to Redis, so perhaps my top-level items should be hashes. This works to first order. If my high-level code is like: <pre><code>UpdateCache(myFoo); AddToIndex(myFoo); </code></pre> That translates into: <pre><code>HSET ("ServiceCache:Foo", theFoo.FooId, JsonSerialize(theFoo)); var myFoos = JsonDeserialize( HGET ("ServiceCache:FooIndex", theFoo.OwnerId) ); myFoos.Add(theFoo.OwnerId); HSET ("ServiceCache:FooIndex", theFoo.OwnerId, JsonSerialize(myFoos)); </code></pre> However, this is broken in two ways. <ol> <li>Two concurrent operations can read/modify/write at the same time. The latter "wins" the final <code>HSET</code> and the former's index update is lost.</li> <li>Another operation could read the index in between the first and second lines. It would miss a Foo that it should find.</li> </ol> <h2>So how do I index properly?</h2> I think I could use a Redis set instead of a json-encoded value for the index. That would solve part of the problem since the "add-to-index-if-not-already-present" would be atomic. I also read about using <code>MULTI</code> as a "transaction" but it doesn't seem like it does what I want. Am I right that I can't really <code>MULTI; HGET; {update}; HSET; EXEC</code> since it doesn't even do the <code>HGET</code> before I issue the <code>EXEC</code>? I also read about using WATCH and MULTI for optimistic concurrency, then retrying on failure. But WATCH only works on top-level keys. So it's back to <code>SET/GET</code> instead of <code>HSET/HGET</code>. And now I need a new index-like-thing to support getting all the values in a given cache. If I understand it right, I can combine all these things to do the job. Something like: <pre><code>while(!succeeded) { WATCH( "ServiceCache:Foo:" + theFoo.FooId ); WATCH( "ServiceCache:FooIndexByOwner:" + theFoo.OwnerId ); WATCH( "ServiceCache:FooIndexAll" ); MULTI(); SET ("ServiceCache:Foo:" + theFoo.FooId, JsonSerialize(theFoo)); SADD ("ServiceCache:FooIndexByOwner:" + theFoo.OwnerId, theFoo.FooId); SADD ("ServiceCache:FooIndexAll", theFoo.FooId); EXEC(); //TODO somehow set succeeded properly } </code></pre> Finally I'd have to translate this pseudocode into real code depending how my client library uses <code>WATCH/MULTI/EXEC</code>; it looks like they need some sort of context to hook them together. All in all this seems like a lot of complexity for what has to be a very common case; I can't help but think there's a better, smarter, Redis-ish way to do things that I'm just not seeing. <h2>How do I lock properly?</h2> Even if I had no indexes, there's still a (probably rare) race condition. <pre><code>A: HGET - cache miss B: HGET - cache miss A: SELECT B: SELECT A: HSET C: HGET - cache hit C: UPDATE C: HSET B: HSET ** this is stale data that's clobbering C's update. </code></pre> Note that C could just be a really-fast A. Again I think <code>WATCH</code>, <code>MULTI</code>, retry would work, but... ick. I know in some places people use special Redis keys as locks for other objects. Is that a reasonable approach here? Should those be top-level keys like <code>ServiceCache:FooLocks:{Id}</code> or <code>ServiceCache:Locks:Foo:{Id}</code>? Or make a separate hash for them - <code>ServiceCache:Locks</code> with <code>subkeys Foo:{Id}</code>, or <code>ServiceCache:Locks:Foo</code> with subkeys <code>{Id}</code> ? How would I work around abandoned locks, say if a transaction (or a whole server) crashes while "holding" the lock?
Tags
<concurrency><indexing><redis>
Title
How should I implement simple caches with concurrency on Redis?
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. This table or related slice is empty.
UserOwnerUserId
1. USsolublefish
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POHow should I implement simple caches with concurrency on Redis?
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 POHow should I implement simple caches with concurrency on Redis?
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId
1. This table or related slice is empty.

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.