StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>The simple answer is that CouchDB views aren't ideal for this.</p> <p>The more complicated answer is that this type of query tends to be very inefficient in typical SQL engines too, and so if you grant that there will be tradeoffs with <em>any</em> solution then CouchDB actually has the benefit of letting you choose your tradeoff.</p> <p>1. <strong>The SQL Way</strong></p> <p>When you do <code>SELECT ... WHERE name LIKE %bro%</code>, all the SQL engines I'm familiar with must do what's called a "full table scan". This means the server reads every row in the relevant table, and brute force scans the field to see if it matches.</p> <p>You can do this in CouchDB too, with a <a href="http://wiki.apache.org/couchdb/HTTP_view_API#Temporary_Views" rel="noreferrer">temporary view</a>:</p> <pre><code>POST /some_database/_temp_view {"map": "function (doc) { if (doc.name && doc.name.indexOf('bro') !== -1) emit(null); }"} </code></pre> <p>This will look through every single document in the database and give you a list of matching documents. You can tweak the map function to also match on a document type, or to emit with a certain key for ordering — <code>emit(doc.timestamp)</code> — or some data value useful to your purpose — <code>emit(null, doc.name)</code>.</p> <p>2. <strong>The "tons of disk space available" way</strong></p> <p>Depending on your source data size you could create an index that emits every possible "interior string" as its permanent (on-disk) view key. That is to say for a name like "Dobros" you would <code>emit("dobros"); emit("obros"); emit("bros"); emit("ros"); emit("os"); emit("s");</code>. Then for a term like '%bro%' you could query your view with <code>startkey="bro"&endkey="bro\uFFFF"</code> to get all occurrences of the lookup term. Your index will be approximately the size of your text content <em>squared</em>, but if you need to do an arbitrary "find in string" faster than the full DB scan above and have the space this might work. You'd be better served by a data structure designed for <a href="http://en.wikipedia.org/wiki/Substring_index" rel="noreferrer">substring searching</a> though.</p> <p>Which brings us too...</p> <p>3. <strong>The Full Text Search way</strong></p> <p>You could use a CouchDB plugin (<a href="https://github.com/rnewson/couchdb-lucene" rel="noreferrer">couchdb-lucene</a>, <a href="http://www.elasticsearch.org/tutorials/2010/08/01/couchb-integration.html" rel="noreferrer">ElasticSearch</a>, <a href="http://apidoc.apsw.googlecode.com/hg/couchdb.html" rel="noreferrer">SQLite's FTS</a>) to generate an auxiliary text-oriented index into your documents.</p> <p>Note that most full text search indexes <a href="http://wiki.apache.org/lucene-java/LuceneFAQ#What_wildcard_search_support_is_available_from_Lucene.3F" rel="noreferrer">don't naturally support</a> arbitrary wildcard prefixes either, likely for similar reasons of space efficiency as we saw above. Usually full text search doesn't imply "brute force binary search", but "word search". YMMV though, take a look around at the options available in your full text engine.</p> <p>If you don't really need to find "bro" <em>anywhere</em> in a field, you can implement basic "find a word starting with X" search with regular CouchDB views by just splitting on various locale-specific word separators and omitting these "words" as your view keys. This will be more efficient than above, scaling proportionally to the amount of data indexed.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload