Note that there are some explanatory texts on larger screens.

plurals
  1. POHow can I troubleshoot the reason for my MongoDB server suddenly taking 100% CPU?
    primarykey
    data
    text
    <p>I'm about ready to go live with my node.js/mongo app running on the Amazon Cloud. I have a 3x replica set for the Mongo servers. Everything was working fine until, suddenly, about 20 minutes ago, the PRIMARY mongo server jumped to 100% CPU usage (usually it has barely any usage). I'm currently testing the app with only ~10 users, so this is very worrysome.</p> <p>My first reaction was, of course, to grab the mongodb log file from the server. I expected this to be revealing, but now I'm more confused than ever. One of the primary functions of my database is to cache data for users, so I have a Collection ('DataCache') which simply stores a JSON string (Mongoose code):</p> <pre><code>new Model('DataCache',{ '_id': { type: String, unique: true }, 'data': String, 'updated': Date }); </code></pre> <p>Looking at the logs from the "100% CPU" time I see that the standard update requests were performed, but taking as much as ~47 seconds !!</p> <pre><code>Mon Aug 6 08:58:36 [conn28821] update storage.datacache query: { _id: "14954006/mentions/dcc3c69e72da714a0f3bffc518183ebb" } update: { $set: ... } } 47174ms </code></pre> <p>This request was not any longer in data than usual (about 1000 characters in the JSON string; data was truncated here for brevity).</p> <p>I'm really not sure where else to be looking to figure out why my usage suddenly jumped so far up. I can't imagine what was unusual/unique about this scenario, and I don't see anything else in the logs, but I'm very worried about what will happen when our 10 users scale to thousands...</p> <p>The problem disappeared as suddenly as it appeared, about 20 minutes after starting, but the CPU is still seeing weird spikes (RightScale dashboard image): <img src="https://i.stack.imgur.com/0YHuQ.png" alt="RightScale"></p> <hr> <p>UPDATE: Here's some info printed from mongo about the cache collection, in particular. I'm not certain that the problem has to do with the cache collection, but it is the one query I was seeing the most consistently during the lag-time...</p> <pre><code> { "ns" : "storage.datacache", "count" : 43949, "size" : 132274592, "avgObjSize" : 3009.729277116658, "storageSize" : 158887936, "numExtents" : 13, "nindexes" : 5, "lastExtentSize" : 33828864, "paddingFactor" : 1.0099999999994833, "flags" : 1, "totalIndexSize" : 10972192, "indexSizes" : { "_id_" : 4570384, }, "ok" : 1 } </code></pre> <hr> <p>EDIT: More graphs <img src="https://i.stack.imgur.com/XNGAe.png" alt="enter image description here"> <img src="https://i.stack.imgur.com/kLPf8.png" alt="enter image description here"></p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload