Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p><em>Note: See the information regarding Graphviz below.</em></p> <p>This should give you a starting point:</p> <p><strong>Edit:</strong> This version handles genes that are described by more than one character.</p> <pre><code>awk ' BEGIN { regdelim = "|" } { delim="" if ($2 == "+") { if (plus[$1]) delim=regdelim plus[$1]=plus[$1] delim $3 } else if ($2 == "-") { if (minus[$1]) delim=regdelim minus[$1]=minus[$1] delim $3 } } END { for (root in plus) { split(plus[root],regs,regdelim) for (reg in regs) { if (plus[regs[reg]] &amp;&amp; plus[root] ~ plus[regs[reg]]) { print "Match: ", root, "+", regs[reg], "+", plus[regs[reg]] } } } } ' inputfile </code></pre> <p>In the <code>BEGIN</code> clause, set <code>regdelim</code> to a character that doesn't appear in your data.</p> <p>I've omitted the processing code for the minus data.</p> <p>Output:</p> <pre><code>Match: a + b + c Match: f + g + h </code></pre> <p><strong>Edit 2:</strong></p> <p>The version below allows you to search for arbitrary combinations. It generalizes the technique used in the original version so no code needs to be duplicated. It also fixes a couple of other <s>bugs</s><i>limitations</i>.</p> <pre><code>#!/bin/bash # written by Dennis Williamson - 2010-11-12 # for http://stackoverflow.com/questions/4161001/counting-the-occurrence-of-a-sub-graph-in-a-graph # A (AB) B, A (AC) C, B (BC) C - where "(XY)" represents a + or a - # provided by the positional parameters $1, $2 and $3 # $4 carries the data file name and is referenced at the end of the script awk -v AB=$1 -v AC=$2 -v BC=$3 ' BEGIN { regdelim = "|" } { if ($2 == AB) { if (regAB[$1]) delim=regdelim; else delim="" regAB[$1]=regAB[$1] delim $3 } if ($2 == AC) { if (regAC[$1]) delim=regdelim; else delim="" regAC[$1]=regAC[$1] delim $3 } if ($2 == BC) { if (regBC[$1]) delim=regdelim; else delim="" regBC[$1]=regBC[$1] delim $3 } } END { for (root in regAB) { split(regAB[root],ABarray,regdelim) for (ABindex in ABarray) { split(regAC[root],ACarray,regdelim) for (ACindex in ACarray) { split(regBC[ABarray[ABindex]],BCarray,regdelim) for (BCindex in BCarray) { if (ACarray[ACindex] == BCarray[BCindex]) { print " Match:", root, AB, ABarray[ABindex] ",", root, AC, ACarray[ACindex] ",", ABarray[ABindex], BC, BCarray[BCindex] } } } } } } ' "$4" </code></pre> <p>This can be called like this to do an exhaustive search:</p> <pre><code>for ab in + -; do for ac in + -; do for bc in + -; do echo "Searching: $ab$ac$bc"; ./searchgraph $ab $ac $bc inputfile; done; done; done </code></pre> <p>For this data:</p> <pre><code>a - e a + b b + c c - f m - n b - d a + c b - e l - n f + g b + i g + h l + m f + h a + i a - j k - j a - k </code></pre> <p>The output of the shell loop calling the new version of the script would look like this:</p> <pre><code>Searching: +++ Match: a + b, a + c, b + c Match: a + b, a + i, b + i Match: f + g, f + h, g + h Searching: ++- Searching: +-+ Searching: +-- Match: l + m, l - n, m - n Match: a + b, a - e, b - e Searching: -++ Searching: -+- Searching: --+ Searching: --- Match: a - k, a - j, k - j </code></pre> <p><strong>Edit 3:</strong></p> <h3>Graphviz</h3> <p>Another approach would be to use <a href="http://www.graphviz.org/" rel="nofollow noreferrer">Graphviz</a>. The <a href="http://www.graphviz.org/doc/info/lang.html" rel="nofollow noreferrer">DOT language</a> can describe the graph and <a href="http://linux.die.net/man/1/gvpr" rel="nofollow noreferrer"><code>gvpr</code></a>, which is an "AWK-like"<sup>1</sup> programming language, can analyze and manipulate DOT files.</p> <p><a href="https://i.stack.imgur.com/zHJwM.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/zHJwM.png" width="300"></a></p> <p>Given the input data in the format as shown in the question, you can use the following AWK program to convert it to DOT:</p> <pre><code>#!/usr/bin/awk -f BEGIN { print "digraph G {" print " size=\"5,5\"" print " ratio=.85" print " node [fontsize=24 color=blue penwidth=3]" print " edge [fontsize=18 labeldistance=5 labelangle=-8 minlen=2 penwidth=3]" print " {rank=same; f l}" m = "-" # ASCII minus/hyphen as in the source data um = "−" # u2212 minus: − which looks better on the output graphic p = "+" } { if ($2 == m) { $2 = um; c = lbf = "red"; arr=" arrowhead = empty" } if ($2 == p) { c = lbf = "green3"; arr="" } print " " $1, "-&gt;", $3, "[taillabel = \"" $2 "\" color = \"" c "\" labelfontcolor = \"" lbf "\"" arr "]" } END { print "}" } </code></pre> <p>The command to run would be something like this:</p> <pre><code>$ ./dat2dot data.dat &gt; data.dot </code></pre> <p>You can then create the graphic above using:</p> <pre><code>$ dot -Tpng -o data.png data.dot </code></pre> <p>I used the extended data as given above in this answer.</p> <p>To do an exhaustive search for the type of subgraphs you specified, you can use the following <code>gvpr</code> program:</p> <pre><code>BEGIN { edge_t AB, BC, AC; } E { AB = $; BC = fstedge(AB.head); while (BC &amp;&amp; BC.head.name != AB.head.name) { AC = isEdge(AB.tail,BC.head,""); if (AC) { printf("%s %s %s, ", AB.tail.name, AB.taillabel, AB.head.name); printf("%s %s %s, ", AC.tail.name, AC.taillabel, AC.head.name); printf("%s %s %s\n", BC.tail.name, BC.taillabel, BC.head.name); } BC = nxtedge(BC, AB.head); } } </code></pre> <p>To run it, you could use:</p> <pre><code>$ gvpr -f groups.g data.dot | sort -k 2,2 -k 5,5 -k 8,8 </code></pre> <p>The output would be similar to that from the AWK/shell combination above (under "Edit 2"):</p> <pre><code>a + b, a + c, b + c a + b, a + i, b + i f + g, f + h, g + h a + b, a − e, b − e l + m, l − n, m − n a − k, a − j, k − j </code></pre> <hr> <p><sup><strong>1</strong></sup><sub> Loosely speaking.</sub></p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload