StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p><strong>General case</strong>: It's always <em>possible</em> to use a CouchDB-style map-reduce view, but it's not necessarily <em>practical</em>. </p> <p>In the end, it's mostly a counting-based argument: if you need to ask the question for any subset of your 500,000 products, then your database must be able to provide a distinct answer to each of 2<sup>500,000</sup> different possible questions, which uses a prohibitive amount of memory if you have to emit a B-tree leaf for every one of them (and you need to emit data unless the answer to most of these queries is zero, false, an empty set or a similar null value). </p> <p>CouchDB provides a first small optimization through the existence of range queries (meaning that in an ideal case, it can use as little as N B-tree leaves to answer N<sup>2</sup> questions). However, in your example, this would only reduce the number of leaves down to 2<sup>250,000</sup> (and that's a <em>theoretical</em> lower bound).</p> <p>CouchDB provides a second small optimization through key prefix queries, meaning that you can compress [A], [A,B] and [A,B,C] queries into a single [A,B,C] key. So, instead of your 2<sup>250,000</sup> possibilities, you're down to a "mere" 2<sup>249,999</sup> ...</p> <p>So, while you could think up an emitting strategy for answering the question for any subset, it would take more storage space than is actually available on our planet. In the general case, to answer N different questions you need to emit at least <code>sqrt(N/2)</code> B-tree leaves, so count your questions and determine if that lower bound on the number of leaves is acceptable.</p> <p><strong>Only for categories and subcategories</strong>: if you give up on arbitrary lists of products and only ask questions of the form "give me the significant attributes in category A filtered by attributes B and C", then your number of emits drops to: </p> <pre><code> AvgCategories * AvgAttr * 2 ^ (AvgAttr - 1) * 500,000 </code></pre> <p>You're basically emitting for each product the keys <code>[Category,Attr,Attr,...]</code> for all categories of the product and all combinations of attributes of the product, which lets you query by category + attributes. If you have on average 1 category and 3 attributes per product, this works out to about 6 million entries, which is fairly acceptable. </p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload