Note that there are some explanatory texts on larger screens.

plurals
  1. POSlow insert performance on sharded MongoDB cluster
    text
    copied!<p>I'm have trouble with slow insert performance on a sharded cluster. My setup consists of 5 shards and each shard has at least 3 replica set members. As far as network topology goes, one group of RS members is living in Rackspace Cloud, the rest are on AWS. Running 2.4.6 on all </p> <p>I'm processing a file in Java and writing it to MongoDB. Each file is ~60MB and the resulting data for a file ends up as ~160MB in the DB. I'm connecting to a mongos from my Java application. I'm sharding on the hash of the _id (auto-generated ObjectID) and I have write concern set to UNACKNOWLEDGED.</p> <p>If I write to an unsharded collection I can write the whole file in ~90 seconds. If I write to a sharded collection it's taking me ~20 minutes! </p> <p>I've done some initial debugging so far:</p> <ul> <li>I've tried creating a new collection and writing to it</li> <li>I've tried disabling the balancer to ensure that there were no migrations slowing things down (I've confirmed that the balancer is disabled)</li> <li>Don't see anything strange going on in the mongos or mongod logs</li> </ul> <p>Things I've noticed:</p> <ul> <li><p>The primary node on the primary shard is sitting at almost a constant 80% write lock. The other primaries are hovering around 5% with occasional spikes to 30%. The secondaries are all sitting around 5% with occasional spikes to 15%</p></li> <li><p>sh.status() shows even chunk distribution but db.collection.stats() shows that the primary shard has a count &amp; size that's twice as big as the other four shards</p></li> <li><p>No other noticeable errors in the logs or MMS</p></li> </ul> <p>Any ideas on how I can further debug this issue? </p> <p>Update with output from sh.status()</p> <pre><code> prod.collection shard key: { "_id" : "hashed" } chunks: rs1 8 rs2 8 rs3 8 rs4 8 rs0 8 too many chunks to print, use verbose if you want to force print </code></pre> <p>And the output from collection.stats()</p> <pre><code>mongos&gt; db.collection.stats() { "sharded" : true, "ns" : "prod.collection", "count" : 879837, "numExtents" : 76, "size" : 2210698416, "storageSize" : 2653114368, "totalIndexSize" : 73526768, "indexSizes" : { "_id_" : 31526656, "_id_hashed" : 42000112 }, "avgObjSize" : 2512.6226971586784, "nindexes" : 2, "nchunks" : 20, "shards" : { "rs0" : { "ns" : "prod.collection", "count" : 300130, "size" : 754047552, "avgObjSize" : 2512.403131976144, "storageSize" : 873058304, "numExtents" : 17, "nindexes" : 2, "lastExtentSize" : 232005632, "paddingFactor" : 1.0000000000001465, "systemFlags" : 1, "userFlags" : 0, "totalIndexSize" : 24037440, "indexSizes" : { "_id_" : 9753968, "_id_hashed" : 14283472 }, "ok" : 1 }, "rs1" : { "ns" : "prod.collection", "count" : 139598, "size" : 350820064, "avgObjSize" : 2513.07371165776, "storageSize" : 470589440, "numExtents" : 15, "nindexes" : 2, "lastExtentSize" : 127299584, "paddingFactor" : 1.000000000000052, "systemFlags" : 1, "userFlags" : 0, "totalIndexSize" : 11626272, "indexSizes" : { "_id_" : 5060944, "_id_hashed" : 6565328 }, "ok" : 1 }, "rs2" : { "ns" : "prod.collection", "count" : 149987, "size" : 376944272, "avgObjSize" : 2513.179622233927, "storageSize" : 470593536, "numExtents" : 15, "nindexes" : 2, "lastExtentSize" : 127299584, "paddingFactor" : 1.0000000000000484, "systemFlags" : 1, "userFlags" : 0, "totalIndexSize" : 12713680, "indexSizes" : { "_id_" : 5674144, "_id_hashed" : 7039536 }, "ok" : 1 }, "rs3" : { "ns" : "prod.collection", "count" : 140235, "size" : 352293776, "avgObjSize" : 2512.167262095768, "storageSize" : 377905152, "numExtents" : 14, "nindexes" : 2, "lastExtentSize" : 104161280, "paddingFactor" : 1.0000000000000422, "systemFlags" : 1, "userFlags" : 0, "totalIndexSize" : 11863376, "indexSizes" : { "_id_" : 5110000, "_id_hashed" : 6753376 }, "ok" : 1 }, "rs4" : { "ns" : "prod.collection", "count" : 149887, "size" : 376592752, "avgObjSize" : 2512.5111050324576, "storageSize" : 460967936, "numExtents" : 15, "nindexes" : 2, "lastExtentSize" : 124985344, "paddingFactor" : 1.000000000000043, "systemFlags" : 1, "userFlags" : 0, "totalIndexSize" : 13286000, "indexSizes" : { "_id_" : 5927600, "_id_hashed" : 7358400 }, "ok" : 1 } }, "ok" : 1 } </code></pre> <p>Balancer status:</p> <pre><code>mongos&gt; !sh.getBalancerState() &amp;&amp; !sh.isBalancerRunning() true </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload