Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>You should read the following and learn a little bit about the advantages of a well designed innodb table and how best to use clustered indexes - only available with innodb !</p> <p><a href="http://dev.mysql.com/doc/refman/5.0/en/innodb-index-types.html" rel="noreferrer">http://dev.mysql.com/doc/refman/5.0/en/innodb-index-types.html</a></p> <p><a href="http://www.xaprb.com/blog/2006/07/04/how-to-exploit-mysql-index-optimizations/" rel="noreferrer">http://www.xaprb.com/blog/2006/07/04/how-to-exploit-mysql-index-optimizations/</a></p> <p>then design your system something along the lines of the following simplified example:</p> <h2>Example schema (simplified)</h2> <p>The important features are that the tables use the innodb engine and the primary key for the threads table is no longer a single auto_incrementing key but a composite <strong>clustered</strong> key based on a combination of forum_id and thread_id. e.g.</p> <pre><code>threads - primary key (forum_id, thread_id) forum_id thread_id ======== ========= 1 1 1 2 1 3 1 ... 1 2058300 2 1 2 2 2 3 2 ... 2 2352141 ... </code></pre> <p>Each forum row includes a counter called next_thread_id (unsigned int) which is maintained by a trigger and increments every time a thread is added to a given forum. This also means we can store 4 billion threads per forum rather than 4 billion threads in total if using a single auto_increment primary key for thread_id.</p> <pre><code>forum_id title next_thread_id ======== ===== ============== 1 forum 1 2058300 2 forum 2 2352141 3 forum 3 2482805 4 forum 4 3740957 ... 64 forum 64 3243097 65 forum 65 15000000 -- ooh a big one 66 forum 66 5038900 67 forum 67 4449764 ... 247 forum 247 0 -- still loading data for half the forums ! 248 forum 248 0 249 forum 249 0 250 forum 250 0 </code></pre> <p>The disadvantage of using a composite key is that you can no longer just select a thread by a single key value as follows:</p> <pre><code>select * from threads where thread_id = y; </code></pre> <p>you have to do:</p> <pre><code>select * from threads where forum_id = x and thread_id = y; </code></pre> <p>However, your application code should be aware of which forum a user is browsing so it's not exactly difficult to implement - store the currently viewed forum_id in a session variable or hidden form field etc...</p> <p>Here's the simplified schema:</p> <pre><code>drop table if exists forums; create table forums ( forum_id smallint unsigned not null auto_increment primary key, title varchar(255) unique not null, next_thread_id int unsigned not null default 0 -- count of threads in each forum )engine=innodb; drop table if exists threads; create table threads ( forum_id smallint unsigned not null, thread_id int unsigned not null default 0, reply_count int unsigned not null default 0, hash char(32) not null, created_date datetime not null, primary key (forum_id, thread_id, reply_count) -- composite clustered index )engine=innodb; delimiter # create trigger threads_before_ins_trig before insert on threads for each row begin declare v_id int unsigned default 0; select next_thread_id + 1 into v_id from forums where forum_id = new.forum_id; set new.thread_id = v_id; update forums set next_thread_id = v_id where forum_id = new.forum_id; end# delimiter ; </code></pre> <p>You may have noticed I've included reply_count as part of the primary key which is a bit strange as (forum_id, thread_id) composite is unique in itself. This is just an index optimisation which saves some I/O when queries that use reply_count are executed. Please refer to the 2 links above for further info on this.</p> <h2>Example queries</h2> <p>I'm still loading data into my example tables and so far I have a loaded approx. 500 million rows (half as many as your system). When the load process is complete I should expect to have approx:</p> <pre><code>250 forums * 5 million threads = 1250 000 000 (1.2 billion rows) </code></pre> <p>I've deliberately made some of the forums contain more than 5 million threads for example, forum 65 has 15 million threads:</p> <pre><code>forum_id title next_thread_id ======== ===== ============== 65 forum 65 15000000 -- ooh a big one </code></pre> <h2>Query runtimes</h2> <pre><code>select sum(next_thread_id) from forums; sum(next_thread_id) =================== 539,155,433 (500 million threads so far and still growing...) </code></pre> <p>under innodb summing the next_thread_ids to give a total thread count is much faster than the usual:</p> <pre><code>select count(*) from threads; </code></pre> <p>How many threads does forum 65 have:</p> <pre><code>select next_thread_id from forums where forum_id = 65 next_thread_id ============== 15,000,000 (15 million) </code></pre> <p>again this is faster than the usual:</p> <pre><code>select count(*) from threads where forum_id = 65 </code></pre> <p>Ok now we know we have about 500 million threads so far and forum 65 has 15 million threads - let's see how the schema performs :)</p> <pre><code>select forum_id, thread_id from threads where forum_id = 65 and reply_count &gt; 64 order by thread_id desc limit 32; runtime = 0.022 secs select forum_id, thread_id from threads where forum_id = 65 and reply_count &gt; 1 order by thread_id desc limit 10000, 100; runtime = 0.027 secs </code></pre> <p>Looks pretty performant to me - so that's a single table with 500+ million rows (and growing) with a query that covers 15 million rows in 0.02 seconds (while under load !)</p> <h2>Further optimisations</h2> <p>These would include:</p> <ul> <li><p>partitioning by range </p></li> <li><p>sharding</p></li> <li><p>throwing money and hardware at it</p></li> </ul> <p>etc...</p> <p>hope you find this answer helpful :)</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload