StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p><strong>p.s: this is the first time in my life that redis website is having problems, but I bet when you visit it, they have solved the problem</strong></p> <pre><code>> We have data that comes in at a rate > of about 10000 objects per second. > This needs to be processed > asynchronously, but loose ordering is > sufficient, so each object is inserted > round-robin-ly into one of several > message queues (there are also several > producers and consumers) </code></pre> <p>My first advice would be to look at <a href="http://redis.io" rel="nofollow noreferrer">redis</a> because it is insanely fast and I bet you can handle all your messages with only a single message queue.</p> <p>First I like to show you information about my laptop(I like it, but a big server is going to be a lot faster ;)). My dad(was impressed a little bit :)) recently bought a new PC and it beats my laptop hard(8 cpu's instead 2).</p> <pre><code>-Computer- Processor : 2x Intel(R) Core(TM)2 Duo CPU T7100 @ 1.80GHz Memory : 2051MB (1152MB used) Operating System : Ubuntu 10.10 User Name : alfred (alfred) -Display- Resolution : 1920x1080 pixels OpenGL Renderer : Unknown X11 Vendor : The X.Org Foundation -Multimedia- Audio Adapter : HDA-Intel - HDA Intel -Input Devices- Power Button Lid Switch Sleep Button Power Button AT Translated Set 2 keyboard Microsoft Comfort Curve Keyboard 2000 Microsoft Comfort Curve Keyboard 2000 Logitech Trackball Video Bus PS/2 Logitech Wheel Mouse -SCSI Disks- HL-DT-ST DVDRAM GSA-T20N ATA WDC WD1600BEVS-2 </code></pre> <p>Below the benchmarks using <code>redis-benchmark</code> on my machine without even doing much redis optimization:</p> <pre><code>alfred@alfred-laptop:~/database/redis-2.2.0-rc4/src$ ./redis-benchmark ====== PING (inline) ====== 10000 requests completed in 0.22 seconds 50 parallel clients 3 bytes payload keep alive: 1 94.84% <= 1 milliseconds 98.74% <= 2 milliseconds 99.65% <= 3 milliseconds 100.00% <= 4 milliseconds 46296.30 requests per second ====== PING ====== 10000 requests completed in 0.22 seconds 50 parallel clients 3 bytes payload keep alive: 1 91.30% <= 1 milliseconds 98.21% <= 2 milliseconds 99.29% <= 3 milliseconds 99.52% <= 4 milliseconds 100.00% <= 4 milliseconds 45662.10 requests per second ====== MSET (10 keys) ====== 10000 requests completed in 0.32 seconds 50 parallel clients 3 bytes payload keep alive: 1 3.45% <= 1 milliseconds 88.55% <= 2 milliseconds 97.86% <= 3 milliseconds 98.92% <= 4 milliseconds 99.80% <= 5 milliseconds 99.94% <= 6 milliseconds 99.95% <= 9 milliseconds 99.96% <= 10 milliseconds 100.00% <= 10 milliseconds 30864.20 requests per second ====== SET ====== 10000 requests completed in 0.21 seconds 50 parallel clients 3 bytes payload keep alive: 1 92.45% <= 1 milliseconds 98.78% <= 2 milliseconds 99.00% <= 3 milliseconds 99.01% <= 4 milliseconds 99.53% <= 5 milliseconds 100.00% <= 5 milliseconds 47169.81 requests per second ====== GET ====== 10000 requests completed in 0.21 seconds 50 parallel clients 3 bytes payload keep alive: 1 94.50% <= 1 milliseconds 98.21% <= 2 milliseconds 99.50% <= 3 milliseconds 100.00% <= 3 milliseconds 47619.05 requests per second ====== INCR ====== 10000 requests completed in 0.23 seconds 50 parallel clients 3 bytes payload keep alive: 1 91.90% <= 1 milliseconds 97.45% <= 2 milliseconds 98.59% <= 3 milliseconds 99.51% <= 10 milliseconds 99.78% <= 11 milliseconds 100.00% <= 11 milliseconds 44444.45 requests per second ====== LPUSH ====== 10000 requests completed in 0.21 seconds 50 parallel clients 3 bytes payload keep alive: 1 95.02% <= 1 milliseconds 98.51% <= 2 milliseconds 99.23% <= 3 milliseconds 99.51% <= 5 milliseconds 99.52% <= 6 milliseconds 100.00% <= 6 milliseconds 47619.05 requests per second ====== LPOP ====== 10000 requests completed in 0.21 seconds 50 parallel clients 3 bytes payload keep alive: 1 95.89% <= 1 milliseconds 98.69% <= 2 milliseconds 98.96% <= 3 milliseconds 99.51% <= 5 milliseconds 99.98% <= 6 milliseconds 100.00% <= 6 milliseconds 47619.05 requests per second ====== SADD ====== 10000 requests completed in 0.22 seconds 50 parallel clients 3 bytes payload keep alive: 1 91.08% <= 1 milliseconds 97.79% <= 2 milliseconds 98.61% <= 3 milliseconds 99.25% <= 4 milliseconds 99.51% <= 5 milliseconds 99.81% <= 6 milliseconds 100.00% <= 6 milliseconds 45454.55 requests per second ====== SPOP ====== 10000 requests completed in 0.22 seconds 50 parallel clients 3 bytes payload keep alive: 1 91.88% <= 1 milliseconds 98.64% <= 2 milliseconds 99.09% <= 3 milliseconds 99.40% <= 4 milliseconds 99.48% <= 5 milliseconds 99.60% <= 6 milliseconds 99.98% <= 11 milliseconds 100.00% <= 11 milliseconds 46296.30 requests per second ====== LPUSH (again, in order to bench LRANGE) ====== 10000 requests completed in 0.23 seconds 50 parallel clients 3 bytes payload keep alive: 1 91.00% <= 1 milliseconds 97.82% <= 2 milliseconds 99.01% <= 3 milliseconds 99.56% <= 4 milliseconds 99.73% <= 5 milliseconds 99.77% <= 7 milliseconds 100.00% <= 7 milliseconds 44247.79 requests per second ====== LRANGE (first 100 elements) ====== 10000 requests completed in 0.39 seconds 50 parallel clients 3 bytes payload keep alive: 1 6.24% <= 1 milliseconds 75.78% <= 2 milliseconds 93.69% <= 3 milliseconds 97.29% <= 4 milliseconds 98.74% <= 5 milliseconds 99.45% <= 6 milliseconds 99.52% <= 7 milliseconds 99.93% <= 8 milliseconds 100.00% <= 8 milliseconds 25906.74 requests per second ====== LRANGE (first 300 elements) ====== 10000 requests completed in 0.78 seconds 50 parallel clients 3 bytes payload keep alive: 1 1.30% <= 1 milliseconds 5.07% <= 2 milliseconds 36.42% <= 3 milliseconds 72.75% <= 4 milliseconds 93.26% <= 5 milliseconds 97.36% <= 6 milliseconds 98.72% <= 7 milliseconds 99.35% <= 8 milliseconds 100.00% <= 8 milliseconds 12886.60 requests per second ====== LRANGE (first 450 elements) ====== 10000 requests completed in 1.10 seconds 50 parallel clients 3 bytes payload keep alive: 1 0.67% <= 1 milliseconds 3.64% <= 2 milliseconds 8.01% <= 3 milliseconds 23.59% <= 4 milliseconds 56.69% <= 5 milliseconds 76.34% <= 6 milliseconds 90.00% <= 7 milliseconds 96.92% <= 8 milliseconds 98.55% <= 9 milliseconds 99.06% <= 10 milliseconds 99.53% <= 11 milliseconds 100.00% <= 11 milliseconds 9066.18 requests per second ====== LRANGE (first 600 elements) ====== 10000 requests completed in 1.48 seconds 50 parallel clients 3 bytes payload keep alive: 1 0.85% <= 1 milliseconds 9.23% <= 2 milliseconds 11.03% <= 3 milliseconds 15.94% <= 4 milliseconds 27.55% <= 5 milliseconds 41.10% <= 6 milliseconds 56.23% <= 7 milliseconds 78.41% <= 8 milliseconds 87.37% <= 9 milliseconds 92.81% <= 10 milliseconds 95.10% <= 11 milliseconds 97.03% <= 12 milliseconds 98.46% <= 13 milliseconds 99.05% <= 14 milliseconds 99.37% <= 15 milliseconds 99.40% <= 17 milliseconds 99.67% <= 18 milliseconds 99.81% <= 19 milliseconds 99.97% <= 20 milliseconds 100.00% <= 20 milliseconds 6752.19 requests per second </code></pre> <p>As you can hopefully see from benchmarking my simple laptop you probably just need one message queue because can redis handle 10000 <a href="http://redis.io/commands/lpush" rel="nofollow noreferrer">lpush</a> requests in 0.23 seconds and 10000 <a href="http://redis.io/commands/lpop" rel="nofollow noreferrer">lpop</a> requests in 0.21 seconds. When you just need one queue I believe your problem is not a problem anymore(or are the producers producings duplicates which I don't understand completely?).</p> <pre><code>> And it needs to be durable, so the MQs > are configured to persist to disk. </code></pre> <p>redis also persist to disc.</p> <pre><code>> The problem is that often these > objects are duplicated. They do have > 10-byte unique ids. It's not > catastrophic if objects are duplicated > in the queue, but it is if they're > duplicated in the processing after > being taken from the queue. What's the > best way to go about ensuring as close > as possible to linear scalability > whilst ensuring there's no duplication > in the processing of the objects? </code></pre> <p>When using a single message queue(box) this problem does not exist if I understand correctly. But if not you could just simply check if the id <a href="http://redis.io/commands/sismember" rel="nofollow noreferrer">is member of your set ids</a>. When you process the id you should <a href="http://redis.io/commands/srem" rel="nofollow noreferrer">remove it from the set ids</a>. First you should offcourse add the members to the list using <a href="http://redis.io/commands/sadd" rel="nofollow noreferrer">sadd</a>.</p> <p>If one box does not scale anymore you should shard your keys over multiple boxes and check that key on that box. to learn more about this I think you should read the following links:</p> <ul> <li><a href="https://stackoverflow.com/questions/2139443/redis-replication-and-redis-sharding-cluster-difference/2322731#2322731">Redis replication and redis sharding (cluster) difference</a></li> <li><a href="http://antirez.com/post/redis-presharding.html" rel="nofollow noreferrer">http://antirez.com/post/redis-presharding.html</a></li> <li><a href="http://redis.io/presentation/Redis_Cluster.pdf" rel="nofollow noreferrer">http://redis.io/presentation/Redis_Cluster.pdf</a></li> <li><p><a href="http://blog.zawodny.com/2011/02/26/redis-sharding-at-craigslist/" rel="nofollow noreferrer">http://blog.zawodny.com/2011/02/26/redis-sharding-at-craigslist/</a></p> <blockquote> <p>And perhaps linked to that, should the whole object be stored in the message queue, or only the id with the body stored in something like cassandra?</p> </blockquote></li> </ul> <p>If possible you should all your information directly into memory because nothing can run as fast as memory(okay your <a href="http://en.wikipedia.org/wiki/Cache#CPU_cache" rel="nofollow noreferrer">cache</a> memory is even faster but really really small plus you can't access that via your code). Redis does store all your information inside of memory and makes snapshots to disc. I think you should be able to store all your information inside memory and skip using something like Cassandra altogether. </p> <p>Let's consider each object is 400 bytes per object in total at a rate of 10000 per second => 4000000 bytes for all objects per second => 4 MB/s if my calculation is correct. You could easily store that amount of information inside your memory. If you can't you should really consider upgrading your memory if possible at all, because memory isn't that expensive anymore.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload