Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>Start by looking into <code>partition</code>ing your table if you haven't already:</p> <p><a href="http://dev.mysql.com/doc/refman/5.1/en/partitioning.html" rel="nofollow">http://dev.mysql.com/doc/refman/5.1/en/partitioning.html</a></p> <p><a href="http://www.slideshare.net/datacharmer/mysql-partitions-tutorial" rel="nofollow">http://www.slideshare.net/datacharmer/mysql-partitions-tutorial</a></p> <p><a href="http://blog.mayflower.de/archives/353-Is-MySQL-partitioning-useful-for-very-big-real-life-problems.html" rel="nofollow">http://blog.mayflower.de/archives/353-Is-MySQL-partitioning-useful-for-very-big-real-life-problems.html</a></p> <p>How are you 'consolidating' your data? Maybe the method you are using isn't optimal. One good approach (let me know if this is actually what you are doing) is to create a table that contains aggregated data. Then set it up this way:</p> <p>First putting aside how the data is being dumped into your main table...</p> <ul> <li><p>Create a job (cron or whatever you may have handy or already configured) that runs at a specified interval, relative to how the data is loaded into the main table (let's call it <code>MAIN</code>, moving forward). If your MAIN table gets loaded hourly, then sync it. Half-hourly? Doesn't matter. You can check the speed anyway, or if it's near off-peak hours that your reports run, then schedule near then</p></li> <li><p>Properly index your table for consolidated data. Let's call it <code>AGG</code> moving forward.</p></li> <li><p>Create a stored procedure that loads data from MAIN to AGG, which is basically an <code>AGG LOAD FOR INTERVAL-?</code>. Of course, you're the only one here who knows how or when the data gets inserted into MAIN, so you'll also be the one who knows what the aggregation intention is. It's also possible to keep running the aggregating stored procedure if the aggregation intention is not completed (say it's for an entire day.. so it is an accumulative run until that is set)</p></li> <li><p>Use <code>STAGING</code> tables. <em>For me, they're the best</em>.</p></li> <li><p>Create a stored procedure that re-checks the data, so that any updates or additional insertion of records can be reflected in the AGG table by running this procedure. Include parameters for the range to update. If it's daily, then you have a <code>DAILY AGG LOAD</code> and <code>DAILY AGG RELOAD</code> procedure. Include an <code>AGG CHECK INTERVAL</code> and <code>AGG CHECK DAILY</code> procedure which will help you sleep well at night. Oh and not to mention a <code>AGG DATA HOLE CHECK</code> or a <code>MISSING AGG DATA CHECK</code> and apply business rules that implement checking for a required minimum amount of data which you can get from the aggregated table or from the main table or staging table (preferrably)</p></li> <li><p>Of course, never modify the <code>AGG</code> table. Always only reload it.</p></li> <li><p>How does this help? Wouldn't you then only need to have your reports query the <code>AGG</code> table, which is smaller, and faster (since the aggregation has been done already)? Maybe the performance issue comes in with the interval loading, but if you properly structure your table, its indexes and it's maintenance, it should be worth it.</p></li> <li><p>Where does partitioning come in? Archiving. Once a certain time has passed (discuss what's acceptable with your team/boss/top man) you can archive the old data from <code>MAIN</code>. I experienced having to keep 1 year's worth of data in the production database. That kinda felt like a drag, but because it was the client's request, the company had no choice but to give me the disk space I needed (rubs hands) and boy did I play around with it until I got something running decently. I must mention that my experience was with Microsoft SQL Server 2005, and stored procedures and SSIS made it fun.</p></li> </ul> <p>This is all if you don't know it already, and for others who may want to consider options. I'm not saying you didn't know any of the above already; I'm just stating what I have been able to do before -- considering that I didn't have more information to work with from your post, except that you have a consolidation process that you tried.. </p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload