StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
4175375
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
4
CommunityOwnedDate
CreationDate
2010-11-13T23:51:07.977
FavoriteCount
0
LastActivityDate
2010-11-14T01:50:16.160
LastEditDate
2010-11-14T01:50:16.160
LastEditorUserId
232706
OwnerUserId
232706
ParentId
4174258
PostTypeId
2
Score
3
ViewCount
0
LastEditorDisplayName
text
Body
I might me be able to provide a better answer if you also provided some data, particular some example keywords and example <code>title</code>s from each of your tables so we can get a sense of what you're trying to actually match on. But I will try to answer with what you have provided. First let me put in English what I think your query will do and then I'll break down the reasons why and ways to fix it. <pre><code>Perform a full table scan of all instances of `exercises` For each row in `exercises` Find all categories attached via exerciseCategories For each combination of exercise and category Perform a full table scan of all instances of exerciseCategories Look up corresponding category Perform RLIKE match on title Perform a full table scan of all instances of exerciseSearchtags Look up corresponding searchtag Perform RLIKE match on title Join back to exercises table to re-lookup self Perform RLIKE match on title </code></pre> Assuming that you have at least a few sane indexes, this will work out to be <code>E x C x (C + S + 1)</code> where <code>E</code> is the number of exercises, <code>C</code> is the average number of categories for a given exercise, and <code>S</code> is the average number of search tags for a given. If you don't have indexes on at least the IDs you listed, then it will perform far worse. So part of the question depends particularly on the relative sizes of <code>C</code> and <code>S</code> which I can currently only guess at. If <code>E</code> is 1000 and <code>C</code> and <code>S</code> are each about 2-3 then you'll be scanning 8-21000 rows. If <code>E</code> is 1 million and <code>C</code> is 2-3 and <code>S</code> is 10-15, you'll be scanning 26-57 million rows. If <code>E</code> is 1 million and <code>C</code> or <code>S</code> is about 1000, then you'll be scanning well over 1 trillion rows. So no, this won't scale well at all. 1) The LEFT JOINs inside of your subqueries are ignored because the WERE clauses on those same queries forces them to be normal JOINs. This doesn't affect performance much but it does obfuscate your intent. 2) RLIKE (and its alias REGEXP) do not ever utilize indexes AFAIK so they will not ever scale. I can only guess without sample data but I would say that if your searches require matching on word boundaries that you are in need of normalizing your data. Even if your titles seem like natural strings to store, searching through part of them means you're really treating them as a collection of words. So you should either make use of mysql's full text search <a href="http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html" rel="nofollow">capabilities</a> or else you should break you titles out into separate tables that store one word per row. The one row per word will obviously increase your storage but would make your queries almost trivial since you appear to only be doing whole word matches (as opposed to similar words, word roots, etc). 3) The final left joins you have are what cause the <code>E x C</code> part of my formula, you will being doing the same work <code>C</code> times for every exercise. Now, admittedly, under most query plans the subqueries will be cached for each category and so its not in practice quite as bad as I'm suggesting but that will not be true in every case so I'm giving you the worst case scenario. Even if you could verify that you have the proper indexes in place and the query optimizer has avoided all those extra table scans, you will still be returning lots of redundant data because your results will look something like this: <pre><code>Exercise 1 info Exercise 1 info Exercise 1 info Exercise 2 info Exercise 2 info Exercise 2 info etc </code></pre> because each exercise row is duplicated for each exercisecategory entry even though you're not returning anything from exercisecategory or categories (and the categories.ID in your first subquery is actually referencing the categories joined in that subquery NOT the one from the outer query). 4) Since most search engines return results using paging, I would guess you only really need the first X results. Adding a LIMIT X to your query, or better yet LIMIT Y, X where Y is the current page and X is the number of results returned per page will greatly help optimize your query if the search keywords return lots of results. If you can provide us with a little more information on your data, I can update my answer to reflect that. UPDATE Based on your responses, here is my suggested query. Unfortunately, without full text search or indexed words, there are still going to be scaling problems if either your category table or your search tag table is very large. <pre><code> SELECT exercises.ID AS ID, exercises.title AS title, IF(exercises.title RLIKE CONCAT('[[:<:]]',?), 1, 0) + (SELECT COUNT(*) FROM categories JOIN exerciseCategories ON exerciseCategories.categoryID = categories.ID WHERE exerciseCategories.exerciseID = exercises.ID AND categories.title RLIKE CONCAT('[[:<:]]',?)) + (SELECT COUNT(*) FROM searchtags JOIN exerciseSearchtags ON exerciseSearchtags.searchtagID = searchtags.ID WHERE exerciseSearchtags.exerciseID = exercises.ID AND searchtags.title RLIKE CONCAT('[[:<:]]',?)) FROM exercises </code></pre> ORDER BY relevance DESC HAVING relevance > 0 LIMIT $start, $results I wouldn't normally recommend a HAVING clause but its not gonna be any worse than your RLIKE ... OR RLIKE ..., etc. This addresses my issues #1, #3, #4 but leaves #2 still remaining. Given your example data, I would imagine that each table only has at most a few dozen entries. In that case, the inefficiency of RLIKE might not be painful enough to be worth the optimizations of one word per row but you did ask about scaling. Only an exact equality (<code>title = ?</code>) query or a starts with query (<code>title LIKE 'foo%'</code> ) can use indexes which are an absolute necessity if you are going to scale up the rows in any table. RLIKE and REGEXP don't fit those criteria, no matter the regular expression used (and yours is a 'contains' like query which is the worst case). (It's important to note that <code>title LIKE CONCAT(?, '%')</code> is NOT good enough because mysql sees that it has to calculate something and ignores its index. You need to add the '%' in your application.)
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POIs there anything I can do to optimize this mysql query?
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USRob Van Dam
UserOwnerUserId
1. USRob Van Dam
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. POIs there anything I can do to optimize this mysql query?
 singulars
 PostTypePostTypeId
 PTQuestion
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTAcceptedByOriginator
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.