StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>If you split the 10 records first, then you're finding a small number of strings in many larger strings. This seems to fit <a href="http://en.wikipedia.org/wiki/String_searching_algorithm#Algorithms_using_finite_set_of_patterns" rel="nofollow">http://en.wikipedia.org/wiki/String_searching_algorithm#Algorithms_using_finite_set_of_patterns</a></p> <p>and the <a href="http://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_string_matching_algorithm" rel="nofollow">Aho-Corasick algorithm</a> might work well for you</p> <p>How long are the records?</p> <p>EDIT:</p> <p>This is an unnecessary switcharound - your comparison is symmetric wrt firstArray and secondArray </p> <pre><code> if (firstArray.Length > secondArray.Length) { string[] tempArray = firstArray; firstArray = secondArray; secondArray = tempArray; } </code></pre> <p>instead, replace the return with</p> <p>return findLongest ? value : (firstArray.Length > secondArray.Length) ? value/secondArray.length : value / firstArray.Length);</p> <p>only with something more readable :)</p> <p>UPDATE after question update</p> <p>So you could pre-process the 100,000 (e.g. to hash the words)? And only 10-20 change per day so keeping the preprocessed data up to date would be easy. </p> <p>You definitely need to do something that uses the relatively-static nature of the 100,000. Even if you did the pre-processing just once per day, you could do the comparison with all of last days' records, then use your current slowish approach for any others added since the last preprocessing run. From what you say, there will be at most 10-20 of those</p> <p>I think either the hashing idea, or building a Aho-Comisack trie from the corpus would give you much faster searching. </p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload