Note that there are some explanatory texts on larger screens.

plurals
  1. POArchitecture of processing incoming requests in a service
    primarykey
    data
    text
    <p>I'm designing a server daemon for a project that has to take a large number of simultaneous requests and process them asynchronously. I'm aware of the sheer scale of such a project, but I'm serious about it and am trying to make a clear design and plan before going further.</p> <p>Here's a list of my goals:</p> <ul> <li>Scalability - Must be able to parallelize the architecture onto multiple processors or even multiple servers.</li> <li>Ability to cope with a huge number of parallel connections.</li> <li>Must not cause blocking problems if a single request takes a long time to process.</li> <li>Request to response turnaround time must be minimal.</li> <li>Built around the .NET framework (will be writing this in C#)</li> </ul> <p>My proposed architecture and flow is rather complicated, so here's a chart of my initial design:</p> <p><img src="https://i.stack.imgur.com/pxIqp.png" alt="Architecture Flow Chart"></p> <p>(and <a href="http://i39.tinypic.com/2lwm8oh.png" rel="nofollow noreferrer">here it is on tinypic</a> in case it resizes badly)</p> <p>The idea is that requests come in via the network (though I've not decided if TCP or UDP would be best yet) and are passed immediately to a high-speed load balancer. The load balancer then selects a request queue (RQ) to place the request, using a weighted random number generator. The weights are derived from the size of each queue. The reason for using a weighted RNG, rather than just placing the requests into the least busy queue, is that it prevents an empty but blocked queue (due to a hung request) from locking up the whole server. If all RQs exceed a certain size, the load balancer drops the request and places a "server too busy" response into the output queue (OPQ) - <em>this part isn't shown in the diagram</em>.</p> <p>Each queue corresponds to a thread whose affinity is set to one CPU core on the server. These threads are part of the parallel request processor, which consumes requests from each queue. The requests are categorized into one of three types:</p> <ol> <li><p><strong>Immediate</strong> - Immediate requests are, as the name suggests, processed immediately.</p></li> <li><p><strong>Deferrable</strong> - Deferrable requests are considered to be low priority. They are processed immediately during low load, or placed into the deferred request queue (DRQ) if load is high. The load balancer fetches these deferred requests from the DRQ, marks them as immediate, then places them back into appropriate RQs.</p></li> <li><p><strong>Timed</strong> - Timed requests are placed into the timed request queue (TRQ) along with their target timestamp. These requests are often generated as a result of another request, rather than being explicitly sent in by a client. When the request timestamp is exceeded, the next available request processor thread consumes it and processes it.</p></li> </ol> <p>When a request is processed, data may be fetched from a key/value pair cache in memory, a key/value pair cache or on disk, or from a dedicated SQL database server. The values cached will be BSON, and the index will be a string. I'm thinking of using <code>Dictionary&lt;T1,T2&gt;</code> to implement this in memory, and a btree (or similar) for the disk cache.</p> <p>The response is created when processing is complete, and it is placed into the output queue (OPQ). A loop then consumes responses from the OPQ and transmits them back to the client over the network. If the OPQ reaches 80% of its maximum size, one quarter of the request processor threads are halted. If the OPQ reaches 90% of its maximum size, half of the request processor threads are halted. If the OPQ reaches its maximum size, all request processor threads are halted. This will be achieved with a semaphore, which should also prevent individual request processor threads from getting blocked and leaving stale requests.</p> <p>What I'm looking for are suggestions on a few areas:</p> <ul> <li>Are there any major flaws to this architecture that I missed?</li> <li>Is there anything I should consider changing for performance reasons?</li> <li>Would TCP or UDP be more appropriate for requests? It'd be very useful to have the "proof of delivery" that TCP offers, but the lightweight nature of UDP is appealing too.</li> <li>Are there any special considerations I need to think about when dealing with 100k+ simultaneous connections on a Windows server? I know Linux's TCP stack deals well, but I'm not so sure with Windows.</li> <li>Are there any other questions that I should be asking? Have I forgotten to consider anything?</li> </ul> <p>I know this was a lot to read, and is probably quite a lot to ask too, so thank you for your time.</p> <p><strong>Updated version of the diagram <a href="http://i43.tinypic.com/w6t7r4.png" rel="nofollow noreferrer">here</a>.</strong></p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload