Note that there are some explanatory texts on larger screens.

plurals
  1. PODuplication detection for 3K incoming requests per second, recommended data structure/algorithm?
    text
    copied!<p>Designing a system where a service endpoint (probably a simple servlet) will have to handle 3K requests per second (data will be http posted).</p> <p>These requests will then be stored into mysql.</p> <p><strong>They key issue that I need guidance on is that their will be a high % of duplicate data posted to this endpoint.</strong></p> <p>I only need to store unique data to mysql, so what would you suggest I use to handle the duplication?</p> <p>The posted data will look like:</p> <pre><code>&lt;root&gt; &lt;prop1&gt;&lt;/prop1&gt; &lt;prop2&gt;&lt;/prop2&gt; &lt;prop3&gt;&lt;/prop3&gt; &lt;body&gt; maybe 10-30K of test in here &lt;/body&gt; &lt;/root&gt; </code></pre> <p>I will write a method that will hash prop1, prop2, pro3 to create a unique hashcode (body can be different and still be considered unique).</p> <p><em>I was thinking of creating some sort of concurrent dictionary that will be shared accross requests.</em></p> <p><strong>Their are more chances of duplication of posted data within a period of 24 hours</strong>. So I can purge data from this dictionary after every x hours.</p> <p>Any suggestions on the data structure to store duplications? And what about purging and how many records I should store considering 3K requests per second i.e. it will get large very fast.</p> <p>Note: Their are 10K different sources that will be posting, and the chances of duplication only occurrs for a given source. Meaning I could have more than one dictionary for maybe a group of sources to spread things out. Meaning if source1 posts data, and then source2 posts data, the changes of duplication are very very low. But if source1 posts 100 times in a day, the chances of duplication are very high.</p> <p><strong>Note: please ignore for now the task of saving the posted data to mysql as that is another issue on its own, duplication detection is my first hurdle I need help with.</strong></p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload