Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>An unconventional approach using Perl is below.</p> <pre><code>#! /usr/bin/perl use warnings; use strict; my $graph = q{ a + c b + c c - f b - d a + b f + g g + h f + h }; my $nodes = join ",", sort keys %{ { map +($_ =&gt; 1), $graph =~ /(\w+)/g } }; my $search = "$nodes:$nodes:$nodes:$graph"; my $subgraph = qr/ \A .*? (?&lt;x&gt;\w+) .*: .*? (?&lt;y&gt;\w+) .*: .*? (?&lt;z&gt;\w+) .*: (?= .*^\s* \k&lt;x&gt; \s* \+ \s* \k&lt;y&gt; \s*$) (?= .*^\s* \k&lt;y&gt; \s* \+ \s* \k&lt;z&gt; \s*$) (?= .*^\s* \k&lt;x&gt; \s* \+ \s* \k&lt;z&gt; \s*$) (?{ print "x=$+{x}, y=$+{y}, z=$+{z}\n" }) (?!) /smx; $search =~ /$subgraph/; </code></pre> <p>The regex engine <a href="http://perl.plover.com/NPC/" rel="nofollow">is a powerful tool</a>. For your problem, we describe the structure of a transitive subgraph and then allow the resulting machine to go find all of them.</p> <p>Output:</p> <pre>x=a, y=b, z=c x=f, y=g, z=h</pre> <hr> <p>A more general tool using this same technique is possible. For example, let's say you want to be able to specify gene patterns such as <code>a+b+c;a+c</code> or <code>g1+g2-g3;g1+g3</code>. I hope the meanings of these patterns are obvious.</p> <p>In the front matter, I specify a minimum version of 5.10.0 because the code uses <code>//</code> and lexical <code>$_</code>. The code constructs dynamic regexes that will evaluate code, which the <code>use re 'eval'</code> pragma enables.</p> <pre><code>#! /usr/bin/perl use warnings; use strict; use 5.10.0; use re 'eval'; </code></pre> <p>An identifier is a sequence of one or more “word characters,” <em>i.e.</em>, letters, digits, or underscores.</p> <pre><code>my $ID = qr/\w+/; </code></pre> <p>Given a regex that accepts variable names, <code>unique_vars</code> searches some specification for all variable names and returns them without repetition.</p> <pre><code>sub unique_vars { my($_,$pattern) = @_; keys %{ { map +($_ =&gt; undef), /($pattern)/g } }; } </code></pre> <p>Compiling a gene pattern into a regex is a little hairy. It dynamically generates a search target and regex with the same form as the static one above.</p> <p>The first part with multiple occurrences of comma-separated variables lets the regex engine try each possible value for each gene. Then the lookaheads, <code>(?=...)</code>, scan the graph looking for edges with the desired properties. If all the lookaheads succeed, we record the hit.</p> <p>The strange regex <code>(?!)</code> at the end is an unconditional failure that forces the matcher to backtrack and attempt the match with different genes. Because it's unconditional, the engine will evaluate all possibilities.</p> <p>Calling the same closure from multiple threads concurrently will likely produce strange results.</p> <pre><code>sub compile_gene_pattern { my($dataset,$pattern) = @_; my @vars = sort +unique_vars $pattern, qr/[a-z]\d*/; # / for SO hilite my $nodes = join ",", sort +unique_vars $dataset, $ID; my $search = join("", map "$_:", ($nodes) x @vars) . "\n" . $dataset; my $spec = '\A' . "\n" . join("", map ".*? (?&lt;$_&gt;$ID) .*:\n", @vars); for (split /;/, $pattern) { while (s/^($ID)([-+])($ID)/$3/) { $spec .= '(?= .*^\s* ' . ' \b\k&lt;' . $1 . '&gt;\b ' . ' \s*' . quotemeta($2) . '\s* ' . ' \b\k&lt;' . $3 . '&gt;\b ' . ' \s*$)' . "\n"; } } my %hits; $spec .= '(?{ ++$hits{join "-", @+{@vars}} })' . "\n" . '(?!) # backtrack' . "\n"; my $nfa = eval { qr/$spec/smx } || die "$0: INTERNAL: bad regex:\n$@"; sub { %hits = (); # thread-safety? :-( (my $_ = $search) =~ /$nfa/; map [split /-/], sort keys %hits; } } </code></pre> <p>Read the dataset and let the user know about any problems.</p> <pre><code>sub read_dataset { my($path) = @_; open my $fh, "&lt;", $path or die "$0: open $path: $!"; local $/ = "\n"; local $_; my $graph; my @errors; while (&lt;$fh&gt;) { next if /^\s*#/ || /^\s*$/; if (/^ \s* $ID \s* [-+] \s* $ID \s* $/x) { $graph .= $_; } else { push @errors, "$.: syntax error"; } } return $graph unless @errors; die map "$0: $path:$_\n", @errors; } </code></pre> <p>Now we set it all into motion:</p> <pre><code>my $graphs = shift // "graphs.txt"; my $dataset = read_dataset $graphs; my $ppp = compile_gene_pattern $dataset, "a+b+c;a+c"; print "@$_\n" for $ppp-&gt;(); my $pmp = compile_gene_pattern $dataset, "g1+g2-g3;g1+g3"; print "@$_\n" for $pmp-&gt;(); </code></pre> <p>Given <code>graphs.txt</code> with contents</p> <pre>a + b b + c c - f b - d a + c f + g g + h f + h foo + bar bar - baz foo + baz</pre> <p>and then running the program, we get the following output:</p> <pre>a b c f g h foo bar baz</pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload