Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>This is a partial answer, assuming that your patterns you are trying to match against are all either constant strings with no wildcards in them, or a sequence of strings separated by wilcards "*" that can match any string.</p> <p>This problem has been studied quite a bit in the context of implementing network-based and host-based intrusion detection systems, where you have a bunch of patterns you are looking for in network traffic, where each pattern might be a sign of an intruder sending attack traffic at you.</p> <p>In the special case where there are no wildcards at all in the patterns, and your set of patterns is changing infrequently, so you can afford to spend some time doing some precomputation of data structures when they change, a well-known way to do this is the Aho-Corasick algorithm:</p> <p><a href="http://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_string_matching_algorithm" rel="nofollow">http://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_string_matching_algorithm</a></p> <p>If you then want to generalize to allow wildcards, the following ideas might not have good worst-case performance, but would likely perform well in practice. Break up patterns that have wildcards in them into the "constant string" parts, e.g. break up the pattern "developer.<em>.com" into "developer." and ".com". Put those two strings in the list of ones you are searching for separately. Only if a URL coming in matches both developer. and .com would you then do some more post-processing to make sure it had them both in the desired order (as opposed to in the opposite order, like "a.com.developer.foo" would, and should thus not match the pattern "developer.</em>.com").</p> <p>For large sets of patterns, Aho-Corasick can require lots of memory to store the state-machine that it represents. There have been other similar methods designed later to improve on it. For example, Google for the paper title "Advanced Algorithms for Fast and Scalable Deep Packet Inspection" by Kumar, Turner, and Williams.</p> <p>I am aware other methods of solving this, too, which are patented by Cisco Systems. If there is any chance your company would license these methods, or already has some kind of bulk cross-licensing agreement with Cisco, I'd be happy to tell you more about those.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload