Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>for once, you're calling separate_words(post.text) for every search_word in search_words. You should call separate_words only once for each <code>post</code> in <code>posts</code>.</p> <p>That is, rather than:</p> <pre><code>for search_word in search_words: for post in posts: # do heavy work </code></pre> <p>you should instead have:</p> <pre><code>for post in posts: # do the heavy works for search_word in search_words: ... </code></pre> <p>If, as I suspected, that separate_words do a lot of string manipulations, don't forget that string manipulations is relatively expensive in python since string is immutable.</p> <p>Another improvement you can do, is that you don't have to compare every word in search_words with every word in post_words. If you keep the search_words and post_words array sorted by word length, then you can use a sliding window technique. Basically, since search_word will only match a post_word if the difference in their length is less than 2, then you need only to check among the window of two lengths differences, thereby cutting down the number of words to check, e.g.:</p> <pre><code>search_words = sorted(search_words, key=len) g_post_words = collections.defaultdict(list) # this can probably use list of list for post_word in post_words: g_post_words[len(post_word)].append(post_word) for search_word in search_words: l = len(search_word) # candidates = itertools.chain.from_iterable(g_post_words.get(m, []) for m in range(l - 2, l + 3)) candidates = itertools.chain(g_post_words.get(l - 2, []), g_post_words.get(l - 1, []), g_post_words.get(l , []), g_post_words.get(l + 1, []), g_post_words.get(l + 2, []) ) for post_word in candidates: score = calculate_score(search_word, post_word) # ... and the rest ... </code></pre> <p>(this code probably won't work as is, it's just to illustrate the idea)</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload