Note that there are some explanatory texts on larger screens.

plurals
  1. POHow to design a string matching algorithm where the input is the exact string and the lookup values are regex-ish?
    text
    copied!<p>This is a design question.</p> <p>Background: We get a web request into our system from many different websites (for a widget that we give out), from which we grab the referrer string (if it exists). We use the referrer to decide on some things within the application. The problem arises in that I need to look at a list of "sites" (urls, partial urls, urls containing wildcards) in order to determine what to do. This list could be on the order of many thousands of sites. I need to be able to ask something like a "Site Service" (or whatever) if the referrer is a match with anything in the site list. I need to do this fast, say 5-10ms, give or take a few ms, and get a positive or negative result back.</p> <p>Here is a basic example:</p> <p>Request - Referrer = <a href="http://www.stackoverflow.com/users/120262?tab=accounts">http://www.stackoverflow.com/users/120262?tab=accounts</a></p> <p>Site List Could Contain urls like:</p> <ul> <li><code>users.stackoverflow.com</code> -- (not a match)</li> <li><code>www.stackoverflow.com/users</code> -- (match)</li> <li><code>www.stackoverflow.com/users/120262</code> -- (match)</li> <li><code>www.stackoverflow.com/users/*</code> -- (match)</li> <li><code>*/users/*</code> -- (match)</li> <li><code>www.stackoverflow.com/users/239289</code> -- (not a match)</li> <li><code>*.stackoverflow.com/questions/ask</code> -- (not a match)</li> <li><code>*/questions/*</code> -- (not a match)</li> <li><code>www.stackoverflow.com</code> -- (match)</li> <li><code>www.msdn.com</code> -- (not a match)</li> <li><code>*.msdn.com</code> -- (not a match)</li> <li><code>developer.*.com</code> -- (not a match)</li> </ul> <p>You get the idea...</p> <p>The issue I am dealing with is how to handle this in a performant and scalable way. </p> <p>Performant in that I need to make a decision fast so that I can move on to the real processing that needs to happen. </p> <p>Scalable in that the list of thousands of "sites" is setup for each affiliate that we have and they each may have many site lists, making for thousands of site lists containing thousands of sites.</p> <p>I'm willing to consider pretty much anything here as I am just in the initial (re)design phase of this. Any and all thoughts are welcome including solution suggestions, general patterns to look into, existing tools even.</p> <p>Thanks.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload