StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>The following is the solution to this problem I developed based on Donald Miner's research paper, implemented in Python, for rules applied to MAC addresses.</p> <p>Basically, the most specific match is from the pattern that is not a superset of any other matching pattern. For a particular problem domain, you create a series of tests (functions) which compare two REs and return which is the superset, or if they are orthogonal. This lets you build a tree of matches. For a particular input string, you go through the root patterns and find any matches. Then go through their subpatterns. If at any point, orthogonal patterns match, an error is raised.</p> <p><strong>Setup</strong></p> <pre><code>import re class RegexElement: def __init__(self, string,index): self.string=string self.supersets = [] self.subsets = [] self.disjoints = [] self.intersects = [] self.maybes = [] self.precompilation = {} self.compiled = re.compile(string,re.IGNORECASE) self.index = index SUPERSET = object() SUBSET = object() INTERSECT = object() DISJOINT = object() EQUAL = object() </code></pre> <p><strong>The Tests</strong></p> <p>Each test takes 2 strings (a and b) and tries to determine how they are related. If the test cannot determine the relation, None is returned.</p> <p><code>SUPERSET</code> means <code>a</code> is a superset of <code>b</code>. All matches of <code>b</code> will match <code>a</code>.</p> <p><code>SUBSET</code> means <code>b</code> is a superset of <code>a</code>.</p> <p><code>INTERSECT</code> means some matches of <code>a</code> will match <code>b</code>, but some won't and some matches of <code>b</code> won't match <code>a</code>.</p> <p><code>DISJOINT</code> means no matches of <code>a</code> will match <code>b</code>.</p> <p><code>EQUAL</code> means all matches of <code>a</code> will match <code>b</code> and all matches of <code>b</code> will match <code>a</code>.</p> <pre><code> def equal_test(a, b): if a == b: return EQUAL </code></pre> <p><strong>The graph</strong></p> <pre><code> class SubsetGraph(object): def __init__(self, tests): self.regexps = [] self.tests = tests self._dirty = True self._roots = None @property def roots(self): if self._dirty: r = self._roots = [i for i in self.regexps if not i.supersets] return r return self._roots def add_regex(self, new_regex): roots = self.roots new_re = RegexElement(new_regex) for element in roots: self.process(new_re, element) self.regexps.append(new_re) def process(self, new_re, element): relationship = self.compare(new_re, element) if relationship: getattr(self, 'add_' + relationship)(new_re, element) def add_SUPERSET(self, new_re, element): for i in element.subsets: i.supersets.add(new_re) new_re.subsets.add(i) element.supersets.add(new_re) new_re.subsets.add(element) def add_SUBSET(self, new_re, element): for i in element.subsets: self.process(new_re, i) element.subsets.add(new_re) new_re.supersets.add(element) def add_DISJOINT(self, new_re, element): for i in element.subsets: i.disjoints.add(new_re) new_re.disjoints.add(i) new_re.disjoints.add(element) element.disjoints.add(new_re) def add_INTERSECT(self, new_re, element): for i in element.subsets: self.process(new_re, i) new_re.intersects.add(element) element.intersects.add(new_re) def add_EQUAL(self, new_re, element): new_re.supersets = element.supersets.copy() new_re.subsets = element.subsets.copy() new_re.disjoints = element.disjoints.copy() new_re.intersects = element.intersects.copy() def compare(self, a, b): for test in self.tests: result = test(a.string, b.string) if result: return result def match(self, text, strict=True): matches = set() self._match(text, self.roots, matches) out = [] for e in matches: for s in e.subsets: if s in matches: break else: out.append(e) if strict and len(out) > 1: for i in out: print(i.string) raise Exception("Multiple equally specific matches found for " + text) return out def _match(self, text, elements, matches): new_elements = [] for element in elements: m = element.compiled.match(text) if m: matches.add(element) new_elements.extend(element.subsets) if new_elements: self._match(text, new_elements, matches) </code></pre> <p><strong>Usage</strong></p> <pre><code>graph = SubsetGraph([equal_test, test_2, test_3, ...]) graph.add_regex("00:11:22:..:..:..") graph.add_regex("..(:..){5,5}" graph.match("00:de:ad:be:ef:00") </code></pre> <p>A complete usable version is <a href="https://gist.github.com/perkinslr/209f892b2d760dae184147f902d5f3ae" rel="nofollow noreferrer">here</a>.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload