Note that there are some explanatory texts on larger screens.

plurals
  1. POPHP, MySQL, Efficient tag-driven search algorithm
    primarykey
    data
    text
    <p>I'm currenlty building a webshop. This shop allows users to filter products by <code>category</code>, and a couple optional, additional filters such as <code>brand</code>, <code>color</code>, etc.</p> <p>At the moment, various properties are stored in different places, but I'd like to switch to a tag-based system. Ideally, my database should store tags with the following data:</p> <ul> <li><code>product_id</code> </li> <li><code>tag_url_alias</code> (unique)</li> <li><code>tag_type</code> (unique) (category, product_brand, product_color, etc.)</li> <li><code>tag_value</code> (not unique)</li> </ul> <h1>First objective</h1> <p>I would like to search for <code>product_id</code>'s that are associated with anywhere <strong><em>between 1-5 particular tags</em></strong>. The tags are extracted from a SEO-friendly url. So I will be retrieving a unique strings (the <code>tag_url_alias</code>) for each tag, but I won't know the <code>tag_type</code>. The search will be an <strong><em>intersection</em></strong>, so my search should return the <code>product_id</code>'s that match <strong><em>all</em></strong> of the provided <code>tags</code>.</p> <h1>Second objective</h1> <p>Besides displaying the products that match the current filter, I would also like to display the product-count for other categories and filters which the user might supply.</p> <p>For instance, my current search is for products that match the tags: </p> <pre><code>Shoe + Black + Adidas </code></pre> <p>Now, a visitor of the shop might be looking at the resulting products and wonder which black shoes other brands have to offer. So they might go to the "brand" filter, and choose any of the other listed brands. Lets say they have 2 different options (in practice, this will probably have many more), resulting in the following searches:</p> <pre><code>Shoe + Black + Nike &gt; 103 results Shoe + Black + K-swiss &gt; 0 results </code></pre> <p>In this case, if they see the brand "K-swiss" listed as an available choise in their filter, their search will return 0 results.</p> <p>This is obviously rather disappointing to the user... I'd much rather know that switching the "brand" from "adidas" to "k-swiss" will 0 results, and simply remove the entire option from the filter.</p> <p>Same thing goes for categories, colors, etc.</p> <p>In practice this would mean a single page view would not only return the filtered product list described in my primary objective, but potentially hundreds of similar yet different lists. One for each filter value that could replace another filter value, or be added to the existing filter values.</p> <h1>Capacity</h1> <p>I suspect my database will eventually contain:</p> <blockquote> <p>between 250 and 1.000 unique tags</p> </blockquote> <p>And it will contain:</p> <blockquote> <p>between 10.000 and 100.000 unique products </p> </blockquote> <h1>Current Ideas</h1> <p>I did some Google searches and found the following article: <a href="http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html">http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html</a></p> <p>Judging by that article, running hundreds of queries to achieve the 2nd objective, is going to be a painfully slow route. The "toxy" example might work for my needs and it might be acceptable for my First objective, but it would be unacceptably slow for the Second objective.</p> <p>I was thinking I might run individual queries that match 1 <code>tag</code> to it's associated <code>product_id</code>'s, cache those queries, and then calculate intersections on the results. But, do I calculate these intersections in MySQL? or in PHP? If I use MySQL, is there a particular way I should cache these individual queries, or is supplying the right indexes all I need?</p> <p>I would imagine it's also quite possible to maybe even cache the intersections between two of these <code>tag</code>/<code>product_id</code> sets. The amount of intersections would be limited by the fact that a <code>tag_type</code> can have only one particular value, but I'm not sure how to efficiently manage this type of caching. Again, I don't know if I should do this in MySQL or in PHP. And if I do this in MySQL, what would be the best way to store and combine this type of cached results?</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload