Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>Since you are looking for a library, have you taken a look at PyLucene?</p> <p><a href="http://lucene.apache.org/pylucene/features.html" rel="nofollow noreferrer">http://lucene.apache.org/pylucene/features.html</a></p> <p>While Lucene usually implements ranked retrieval (matches based on a relative score) - as opposed to exact matches - it can be used for exact phrase searching. Here's a link for how to use Lucene to search for an exact phrase. It's written in Java, but gives the idea:</p> <p><a href="https://stackoverflow.com/questions/5527868/exact-phrase-search-using-lucene">Exact Phrase search using Lucene?</a></p> <p>Your question asked specifically about efficiency. Efficiency in what way? I assume that you meant fastest look-up time for the user. If you are indeed talking about speed <em>purely in terms of look-up time for the user</em>, then there is no faster way than actually indexing all words in the document <em>provided that you are willing to endure the initial time to index all documents in the corpus</em>. This is usually the logical choice, since indexing is a one time event, and user searches are a frequent occurrence. Obviously, though, this comes with considerably large memory usage. So, if you are talking about efficiency in terms of memory usage, then you would want to loop over all documents and perform a regex search on each document. You would also use this method if you wanted to avoid the initial look-up time of indexing, though, again, this is unlikely the logical limiting factor given a large corpus size, and given that the concern is usually satisfying a user who will make multiple queries.</p> <p>The only other thing I would point out is that, since you mentioned you are searching patterns and not just words, indexing <em>only the words</em> won't help if you are trying to support querying for patterns (unless that pattern is one of the words in the document!)</p> <p>If you aren't going to use Lucene, and instead want to implement this on your own, take a look at indexing using inverted indeces. Here is an excellent explanation on how to create inverted indices if you are looking to do phrasal queries:</p> <p><a href="http://www.searchenginepeople.com/blog/how-search-really-works-the-index-2.html" rel="nofollow noreferrer">http://www.searchenginepeople.com/blog/how-search-really-works-the-index-2.html</a></p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload