Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Approx 15 years ago I wrote fuzzy search, which can found N closes neighbors. This is my modification of Wilbur's trigram algorithm, and this modification named "Wilbur-Khovayko algorithm".</p> <p>Basic idea: To split strings by trigrams, and search maximal intersection scores.</p> <p>For example, lets we have string "hello world". This string is generates trigrams: hel ell llo "lo ", "o_w", eand so on; Also, produces special prefix/suffix trigrams for each word, like $he $wo lo$ ld$.</p> <p>Thereafter, for each trigram built index, in which term it is present.</p> <p>So, this is list of term_ID for each trigram.</p> <p>When user invoke some string - it also splits to trigrams, and program search maximal intersection score, and generates N-size list.</p> <p>It works quick: I remember, on old Sun/solaris, 256MB ram, 200MHZ CPU, it search 100 closest term in dictionary 5,000,000 terms, in 0.25s</p> <p>You can get my old source from: <a href="http://olegh.ftp.sh/wilbur-khovayko.tar.gz" rel="nofollow">http://olegh.ftp.sh/wilbur-khovayko.tar.gz</a></p> <p>UPDATE:</p> <p>I created new archive, where is Makefile adjusted for modern Linux/BSD make. You can download new version here: <a href="http://olegh.ftp.sh/wilbur-khovayko.tgz" rel="nofollow">http://olegh.ftp.sh/wilbur-khovayko.tgz</a></p> <p>Make some directory, and extract archive here:</p> <pre><code>mkdir F2 cd F2 tar xvfz wilbur-khovayko.tgz make </code></pre> <p>Go to test directory, copy term list file (this is fixed name, termlist.txt), and make index:</p> <pre><code> cd test/ cp /tmp/test/termlist.txt ./termlist.txt ./crefdb.exe &lt;termlist.txt </code></pre> <p>In this test, I used ~380,000 expired domain names:</p> <pre><code>wc -l termlist.txt 379430 termlist.txt </code></pre> <p>Run findtest application:</p> <pre><code>./findtest.exe boking &lt;-- this is query -- word "booking" with misspeling 0001:Query: [boking] 1: 287890 ( 3.863739) [bokintheusa.com,2009-11-20,$69] 2: 287906 ( 3.569148) [bookingseu.com,2009-11-20,$69] 3: 257170 ( 3.565942) [bokitko.com,2009-11-18,$69] 4: 302830 ( 3.413791) [bookingcenters.com,2009-11-21,$69] 5: 274658 ( 3.408325) [bookingsadept.com,2009-11-19,$69] 6: 100438 ( 3.379371) [bookingresorts.com,2009-11-09,$69] 7: 203401 ( 3.363858) [bookinginternet.com,2009-11-15,$69] 8: 221222 ( 3.361689) [bobokiosk.com,2009-11-16,$69] . . . . 97: 29035 ( 2.169753) [buccupbooking.com,2009-11-05,$69] 98: 185692 ( 2.169047) [box-hosting.net,2009-11-14,$69] 99: 345394 ( 2.168371) [birminghamcookinglessons.com,2009-11-25,$69] 100: 150134 ( 2.167372) [bowlingbrain.com,2009-11-12,$69] </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload