StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POPHP, MySQL, Efficient tag-driven search algorithm
primarykey
Id
12897817
data
AcceptedAnswerId
12898203
AnswerCount
2
ClosedDate
CommentCount
3
CommunityOwnedDate
CreationDate
2012-10-15T14:30:26.057
FavoriteCount
5
LastActivityDate
2012-10-15T14:50:55.920
LastEditDate
LastEditorUserId
0
OwnerUserId
940661
ParentId
0
PostTypeId
1
Score
11
ViewCount
7636
LastEditorDisplayName
text
Body
I'm currenlty building a webshop. This shop allows users to filter products by <code>category</code>, and a couple optional, additional filters such as <code>brand</code>, <code>color</code>, etc. At the moment, various properties are stored in different places, but I'd like to switch to a tag-based system. Ideally, my database should store tags with the following data: <ul> <li><code>product_id</code> </li> <li><code>tag_url_alias</code> (unique)</li> <li><code>tag_type</code> (unique) (category, product_brand, product_color, etc.)</li> <li><code>tag_value</code> (not unique)</li> </ul> <h1>First objective</h1> I would like to search for <code>product_id</code>'s that are associated with anywhere between 1-5 particular tags. The tags are extracted from a SEO-friendly url. So I will be retrieving a unique strings (the <code>tag_url_alias</code>) for each tag, but I won't know the <code>tag_type</code>. The search will be an intersection, so my search should return the <code>product_id</code>'s that match all of the provided <code>tags</code>. <h1>Second objective</h1> Besides displaying the products that match the current filter, I would also like to display the product-count for other categories and filters which the user might supply. For instance, my current search is for products that match the tags: <pre><code>Shoe + Black + Adidas </code></pre> Now, a visitor of the shop might be looking at the resulting products and wonder which black shoes other brands have to offer. So they might go to the "brand" filter, and choose any of the other listed brands. Lets say they have 2 different options (in practice, this will probably have many more), resulting in the following searches: <pre><code>Shoe + Black + Nike > 103 results Shoe + Black + K-swiss > 0 results </code></pre> In this case, if they see the brand "K-swiss" listed as an available choise in their filter, their search will return 0 results. This is obviously rather disappointing to the user... I'd much rather know that switching the "brand" from "adidas" to "k-swiss" will 0 results, and simply remove the entire option from the filter. Same thing goes for categories, colors, etc. In practice this would mean a single page view would not only return the filtered product list described in my primary objective, but potentially hundreds of similar yet different lists. One for each filter value that could replace another filter value, or be added to the existing filter values. <h1>Capacity</h1> I suspect my database will eventually contain: <blockquote> between 250 and 1.000 unique tags </blockquote> And it will contain: <blockquote> between 10.000 and 100.000 unique products </blockquote> <h1>Current Ideas</h1> I did some Google searches and found the following article: <a href="http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html">http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html</a> Judging by that article, running hundreds of queries to achieve the 2nd objective, is going to be a painfully slow route. The "toxy" example might work for my needs and it might be acceptable for my First objective, but it would be unacceptably slow for the Second objective. I was thinking I might run individual queries that match 1 <code>tag</code> to it's associated <code>product_id</code>'s, cache those queries, and then calculate intersections on the results. But, do I calculate these intersections in MySQL? or in PHP? If I use MySQL, is there a particular way I should cache these individual queries, or is supplying the right indexes all I need? I would imagine it's also quite possible to maybe even cache the intersections between two of these <code>tag</code>/<code>product_id</code> sets. The amount of intersections would be limited by the fact that a <code>tag_type</code> can have only one particular value, but I'm not sure how to efficiently manage this type of caching. Again, I don't know if I should do this in MySQL or in PHP. And if I do this in MySQL, what would be the best way to store and combine this type of cached results?
Tags
<php><mysql><performance><search><tags>
Title
PHP, MySQL, Efficient tag-driven search algorithm
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. This table or related slice is empty.
UserOwnerUserId
1. USRuben Ray Vreeken
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
2. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POPHP, MySQL, Efficient tag-driven search algorithm
 UserUserId
 USRuben Ray Vreeken
 VoteTypeVoteTypeId
 VTFavorite
2. VO
 singulars
 PostPostId
 POPHP, MySQL, Efficient tag-driven search algorithm
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 POPHP, MySQL, Efficient tag-driven search algorithm
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.