Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Here's how I would design the table:</p> <pre><code>CREATE TABLE all_downloads ( node_id INT UNSIGNED NOT NULL, license_id INT UNSIGNED NOT NULL, user_id INT UNSIGNED NOT NULL, timestamp DATETIME NOT NULL, price NUMERIC (9,2), PRIMARY KEY (node_id,license_id,user_id), KEY (price) ) ENGINE=InnoDB; </code></pre> <p><strong>Notice I omitted the download_id.</strong></p> <p>Now you can run the queries you need to:</p> <ul> <li><p>Get the number of downloads for a given node id and license id over a given time period (how many times has node 5 been downloaded in the last month for 'commercial use'?).</p> <pre><code>SELECT COUNT(*) FROM all_downloads WHERE (node_id,license_id) = (123,456) AND timestamp &gt; NOW() - INTERVAL 30 DAY </code></pre> <p>This should make good use of the clustered primary index, reducing the set of rows examined until the timestamp comparison only applies to a small subset.</p></li> <li><p>Get the total number of downloads for a given node id and license id.</p> <pre><code>SELECT COUNT(*) FROM all_downloads WHERE (node_id,license_id) = (123,456); </code></pre> <p>Like the above, this makes use of the clustered primary index. Counting is accomplished by an index scan.</p></li> <li><p>Get the number of downloads for a given node_id regardless of license (all downloads for 'commercial use' and 'personal use' combined).</p> <pre><code>SELECT COUNT(*) FROM all_downloads WHERE (node_id) = (123); </code></pre> <p>Ditto.</p></li> <li><p>Get the node ids (and corresponding license ids) that have been downloaded by a given user that meet a given price criteria (i.e. price = 0, or price > 0).</p> <pre><code>SELECT node_id, license_id FROM all_downloads WHERE price = 0 AND user_id = 789; </code></pre> <p>This reduces the rows examined by using the secondary index on <code>price</code>. Then you take advantage of the fact that secondary indexes in InnoDB implicitly contain the columns of the primary key, so you don't even need to read the base data. This is called a <em>covering index</em> or an index-only query.</p></li> </ul> <p>As for your other questions:</p> <ul> <li><p>No, it's not a good practice to define a table without a primary key constraint.</p></li> <li><p>No, it's not a good practice to store a serialized array in a single column. See my answer for the question "<a href="https://stackoverflow.com/questions/3653462/is-storing-a-comma-separated-list-in-a-database-column-really-that-bad/3653574#3653574">Is storing a comma separated list in a database column really that bad?</a>"</p></li> </ul> <hr> <blockquote> <p>timestamp ... doesn't really change anything from an optimization standpoint?</p> </blockquote> <p>I prefer datetime over timestamp only because datetime includes timezone information, and timestamp does not. You can always convert a datetime to a UNIX timestamp integer in a query result, using the <a href="http://dev.mysql.com/doc/refman/5.1/en/date-and-time-functions.html#function_unix-timestamp" rel="nofollow noreferrer">UNIX_TIMESTAMP()</a> function.</p> <blockquote> <p>would it be acceptable to make the primary key a cluster of download_id/node_id/license_id/user_id? Or will having the download_id as the first part of the compound key throw off its usefulness?</p> </blockquote> <p>The benefit of a clustered key is that the rows are stored in order of the index. So if you query based on node_id frequently, there's a performance advantage to putting that first in the compound clustered index. I.e. if you are interested in the set of rows for a given node_id, it's a benefit that they're stored together because you defined the clustered index that way.</p> <blockquote> <p>Do you think it still makes sense to have a downloads_counted table, or would that be considered redundant? </p> </blockquote> <p>Sure, storing aggregate results in a table is a common way to reduce the work of counting up frequently-needed totals so often. But do so judiciously, because it takes some work to keep these totals in sync with the real data. The benefit is greater if you need to read the pre-calculated totals frequently, and multiple times for each time they are updated. Make sure you treat the aggregated totals as less authoritative than the real download data, and have a plan for re-generating the totals when they get out of sync.</p> <p>Some people also put these aggregates into memcached keys instead of in a table, for even faster lookups. If the volatile data in memcached is lost for some reason, you can re-populate it from the download data.</p> <pre><code> PRIMARY KEY (node_id, license_id), KEY (node_id) ) ENGINE=InnoDB; </code></pre> <blockquote> <p>is this key redundant, though, if node_id is already the first part of the compound primary key?</p> </blockquote> <p>Yes. MySQL allows you to create redundant indexes, and this is an example of a redundant index. Any query that could use the secondary key on node_id could just as easily use the primary key. In fact, in this case the optimizer will <em>never</em> use the secondary key, because it will prefer the clustered index of the primary key.</p> <p>You can use <a href="http://www.percona.com/doc/percona-toolkit/pt-duplicate-key-checker.html" rel="nofollow noreferrer">pt-duplicate-key-checker</a> to analyze a database for redundant indexes.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload