Note that there are some explanatory texts on larger screens.

plurals
  1. POFast string search using bitwise operators
    primarykey
    data
    text
    <p>What is the fastest (parallel?) way to find a substring in a very long string using bitwise operators?</p> <p>e.g. find all positions of "GCAGCTGAAAACA" sequence in a human genome <a href="http://hgdownload.cse.ucsc.edu/goldenPath/hg18/bigZips/hg18.2bit" rel="nofollow noreferrer">http://hgdownload.cse.ucsc.edu/goldenPath/hg18/bigZips/hg18.2bit</a> (770MB)</p> <p>*the alphabet consists of 4 symbols ('G','C',T,'A') represented using 2 bits: 'G':00, 'A':01, 'T':10, 'C':11 </p> <p>*you can assume the query string (the shorter one) is fixed in length, e.g. 127 characters</p> <p>*by fastest I mean not including any pre-processing/indexing time</p> <p>*the file is going to be loaded into memory after pre-processing, basically there will be billions of short strings to be searched for in a larger string, all in-memory.</p> <p>*bitwise because I'm looking for the simplest, fastest way to search for a bit pattern in a large bit array and stay as close as possible to the silicon.</p> <p>*KMP wouldn't work well as the alphabet is small</p> <p>*C code, x86 machine code would all be interesting.</p> <p>Input format description (.2bit): <a href="http://jcomeau.freeshell.org/www/genome/2bitformat.html" rel="nofollow noreferrer">http://jcomeau.freeshell.org/www/genome/2bitformat.html</a></p> <p>Related: </p> <p><a href="https://stackoverflow.com/questions/1572290/fastest-way-to-scan-for-bit-pattern-in-a-stream-of-bits">Fastest way to scan for bit pattern in a stream of bits</a></p> <p><a href="https://stackoverflow.com/questions/8811335/algorithm-help-fast-algorithm-in-searching-for-a-string-with-its-partner">Algorithm help! Fast algorithm in searching for a string with its partner</a></p> <p><a href="http://www.arstdesign.com/articles/fastsearch.html" rel="nofollow noreferrer">http://www.arstdesign.com/articles/fastsearch.html</a></p> <p><a href="http://en.wikipedia.org/wiki/Bitap_algorithm" rel="nofollow noreferrer">http://en.wikipedia.org/wiki/Bitap_algorithm</a></p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload