Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>One way to solve this is with a genetic algorithm. I happen to have a <a href="https://github.com/handcraftsman/Scratch/tree/master/src/Scratch/GeneticAlgorithm" rel="noreferrer">genetic solver laying around</a> so I applied it to your problem with the following algorithm:</p> <ul> <li>get the distinct tokens from the desired inputs as genes</li> <li>add the Regex specials to the genes</li> <li>for the fitness algorithm <ul> <li>make sure the generated string is a valid regex</li> <li>get a fitness value based on how many desired things it matches and how many undesired things it matches</li> </ul></li> <li>until a successful Regex is found <ul> <li>starting at the number of distinct tokens and incrementing as necessary</li> <li>try to generate a Regex of that length that passes the fitness requirement</li> </ul></li> </ul> <p>Here's my implementation in C#</p> <pre><code>private static void GenerateRegex(IEnumerable&lt;string&gt; target, IEnumerable&lt;string&gt; dontMatch) { string distinctSymbols = new String(target.SelectMany(x =&gt; x).Distinct().ToArray()); string genes = distinctSymbols + "?*()+"; Func&lt;string, uint&gt; calcFitness = str =&gt; { if (str.Count(x =&gt; x == '(') != str.Count(x =&gt; x == ')')) { return Int32.MaxValue; } if ("?*+".Any(x =&gt; str[0] == x)) { return Int32.MaxValue; } if ("?*+?*+".ToArray().Permute(2) .Any(permutation =&gt; str.IndexOf(new string(permutation.ToArray())) != -1)) { return Int32.MaxValue; } Regex regex; try { regex = new Regex("^" + str + "$"); } catch (Exception) { return Int32.MaxValue; } uint fitness = target.Aggregate&lt;string, uint&gt;(0, (current, t) =&gt; current + (regex.IsMatch(t) ? 0U : 1)); uint nonFitness = dontMatch.Aggregate&lt;string, uint&gt;(0, (current, t) =&gt; current + (regex.IsMatch(t) ? 10U : 0)); return fitness + nonFitness; }; for (int targetGeneLength = distinctSymbols.Length; targetGeneLength &lt; genes.Length * 2; targetGeneLength++) { string best = new GeneticSolver(50).GetBestGenetically(targetGeneLength, genes, calcFitness, true); if (calcFitness(best) != 0) { Console.WriteLine("-- not solved with regex of length " + targetGeneLength); continue; } Console.WriteLine("solved with: " + best); break; } } </code></pre> <p>And the result of its application to your samples:</p> <pre><code>public void Given_Sample_A() { var target = new[] { "00", "01", "10" }; var dontMatch = new[] { "11" }; GenerateRegex(target, dontMatch); } </code></pre> <p>output:</p> <pre><code>Generation 1 best: 10 (2) Generation 2 best: 0+ (2) Generation 5 best: 0* (2) Generation 8 best: 00 (2) Generation 9 best: 01 (2) -- not solved with regex of length 2 Generation 1 best: 10* (2) Generation 3 best: 00* (2) Generation 4 best: 01+ (2) Generation 6 best: 10+ (2) Generation 9 best: 00? (2) Generation 11 best: 00+ (2) Generation 14 best: 0?1 (2) Generation 21 best: 0*0 (2) Generation 37 best: 1?0 (2) Generation 43 best: 10? (2) Generation 68 best: 01* (2) Generation 78 best: 1*0 (2) Generation 79 best: 0*1 (2) Generation 84 best: 0?0 (2) Generation 127 best: 01? (2) Generation 142 best: 0+1 (2) Generation 146 best: 0+0 (2) Generation 171 best: 1+0 (2) -- not solved with regex of length 3 Generation 1 best: 1*0+ (1) Generation 2 best: 0+1* (1) Generation 20 best: 1?0+ (1) Generation 31 best: 1?0* (1) -- not solved with regex of length 4 Generation 1 best: 1*00? (1) Generation 2 best: 0*1?0 (1) Generation 3 best: 1?0?0 (1) Generation 4 best: 1?00? (1) Generation 8 best: 1?00* (1) Generation 12 best: 1*0?0 (1) Generation 13 best: 1*00* (1) Generation 41 best: 0*10* (1) Generation 44 best: 1*0*0 (1) -- not solved with regex of length 5 Generation 1 best: 0+(1)? (1) Generation 36 best: 0+()1? (1) Generation 39 best: 0+(1?) (1) Generation 61 best: 1*0+1? (0) solved with: 1*0+1? </code></pre> <p>second sample:</p> <pre><code>public void Given_Sample_B() { var target = new[] { "00", "01", "11" }; var dontMatch = new[] { "10" }; GenerateRegex(target, dontMatch); } </code></pre> <p>output:</p> <pre><code>Generation 1 best: 00 (2) Generation 2 best: 01 (2) Generation 7 best: 0* (2) Generation 12 best: 0+ (2) Generation 33 best: 1+ (2) Generation 36 best: 1* (2) Generation 53 best: 11 (2) -- not solved with regex of length 2 Generation 1 best: 00* (2) Generation 2 best: 0+0 (2) Generation 7 best: 0+1 (2) Generation 12 best: 00? (2) Generation 15 best: 01* (2) Generation 16 best: 0*0 (2) Generation 19 best: 01+ (2) Generation 30 best: 0?0 (2) Generation 32 best: 0*1 (2) Generation 42 best: 11* (2) Generation 43 best: 1+1 (2) Generation 44 best: 00+ (2) Generation 87 best: 01? (2) Generation 96 best: 0?1 (2) Generation 125 best: 11? (2) Generation 126 best: 1?1 (2) Generation 135 best: 11+ (2) Generation 149 best: 1*1 (2) -- not solved with regex of length 3 Generation 1 best: 0*1* (0) solved with: 0*1* </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload