StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
8055568
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
12
CommunityOwnedDate
CreationDate
2011-11-08T18:48:13.557
FavoriteCount
0
LastActivityDate
2011-11-12T16:41:53.450
LastEditDate
2017-05-23T11:47:59.803
LastEditorUserId
-1
OwnerUserId
20860
ParentId
8055322
PostTypeId
2
Score
3
ViewCount
0
LastEditorDisplayName
text
Body
Here's how I would design the table: <pre><code>CREATE TABLE all_downloads ( node_id INT UNSIGNED NOT NULL, license_id INT UNSIGNED NOT NULL, user_id INT UNSIGNED NOT NULL, timestamp DATETIME NOT NULL, price NUMERIC (9,2), PRIMARY KEY (node_id,license_id,user_id), KEY (price) ) ENGINE=InnoDB; </code></pre> Notice I omitted the download_id. Now you can run the queries you need to: <ul> <li>Get the number of downloads for a given node id and license id over a given time period (how many times has node 5 been downloaded in the last month for 'commercial use'?). <pre><code>SELECT COUNT(*) FROM all_downloads WHERE (node_id,license_id) = (123,456) AND timestamp > NOW() - INTERVAL 30 DAY </code></pre> This should make good use of the clustered primary index, reducing the set of rows examined until the timestamp comparison only applies to a small subset.</li> <li>Get the total number of downloads for a given node id and license id. <pre><code>SELECT COUNT(*) FROM all_downloads WHERE (node_id,license_id) = (123,456); </code></pre> Like the above, this makes use of the clustered primary index. Counting is accomplished by an index scan.</li> <li>Get the number of downloads for a given node_id regardless of license (all downloads for 'commercial use' and 'personal use' combined). <pre><code>SELECT COUNT(*) FROM all_downloads WHERE (node_id) = (123); </code></pre> Ditto.</li> <li>Get the node ids (and corresponding license ids) that have been downloaded by a given user that meet a given price criteria (i.e. price = 0, or price > 0). <pre><code>SELECT node_id, license_id FROM all_downloads WHERE price = 0 AND user_id = 789; </code></pre> This reduces the rows examined by using the secondary index on <code>price</code>. Then you take advantage of the fact that secondary indexes in InnoDB implicitly contain the columns of the primary key, so you don't even need to read the base data. This is called a covering index or an index-only query.</li> </ul> As for your other questions: <ul> <li>No, it's not a good practice to define a table without a primary key constraint.</li> <li>No, it's not a good practice to store a serialized array in a single column. See my answer for the question "<a href="https://stackoverflow.com/questions/3653462/is-storing-a-comma-separated-list-in-a-database-column-really-that-bad/3653574#3653574">Is storing a comma separated list in a database column really that bad?</a>"</li> </ul> <hr> <blockquote> timestamp ... doesn't really change anything from an optimization standpoint? </blockquote> I prefer datetime over timestamp only because datetime includes timezone information, and timestamp does not. You can always convert a datetime to a UNIX timestamp integer in a query result, using the <a href="http://dev.mysql.com/doc/refman/5.1/en/date-and-time-functions.html#function_unix-timestamp" rel="nofollow noreferrer">UNIX_TIMESTAMP()</a> function. <blockquote> would it be acceptable to make the primary key a cluster of download_id/node_id/license_id/user_id? Or will having the download_id as the first part of the compound key throw off its usefulness? </blockquote> The benefit of a clustered key is that the rows are stored in order of the index. So if you query based on node_id frequently, there's a performance advantage to putting that first in the compound clustered index. I.e. if you are interested in the set of rows for a given node_id, it's a benefit that they're stored together because you defined the clustered index that way. <blockquote> Do you think it still makes sense to have a downloads_counted table, or would that be considered redundant? </blockquote> Sure, storing aggregate results in a table is a common way to reduce the work of counting up frequently-needed totals so often. But do so judiciously, because it takes some work to keep these totals in sync with the real data. The benefit is greater if you need to read the pre-calculated totals frequently, and multiple times for each time they are updated. Make sure you treat the aggregated totals as less authoritative than the real download data, and have a plan for re-generating the totals when they get out of sync. Some people also put these aggregates into memcached keys instead of in a table, for even faster lookups. If the volatile data in memcached is lost for some reason, you can re-populate it from the download data. <pre><code> PRIMARY KEY (node_id, license_id), KEY (node_id) ) ENGINE=InnoDB; </code></pre> <blockquote> is this key redundant, though, if node_id is already the first part of the compound primary key? </blockquote> Yes. MySQL allows you to create redundant indexes, and this is an example of a redundant index. Any query that could use the secondary key on node_id could just as easily use the primary key. In fact, in this case the optimizer will never use the secondary key, because it will prefer the clustered index of the primary key. You can use <a href="http://www.percona.com/doc/percona-toolkit/pt-duplicate-key-checker.html" rel="nofollow noreferrer">pt-duplicate-key-checker</a> to analyze a database for redundant indexes.
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. PONormalizing/optimizing structure of large mysql table
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USCommunity
UserOwnerUserId
1. USBill Karwin
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. PONormalizing/optimizing structure of large mysql table
 singulars
 PostTypePostTypeId
 PTQuestion
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTAcceptedByOriginator
3. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.