Note that there are some explanatory texts on larger screens.

plurals
  1. POBillions rows in PostgreSql: partition or not to partition?
    text
    copied!<p><strong>What i have:</strong></p> <ul> <li>Simple server with one xeon with 8 logic cores, 16 gb ram, mdadm raid1 of 2x 7200rpm drives.</li> <li>PostgreSql</li> <li>A lot of data to work with. Up to 30 millions of rows are being imported per day.</li> <li>Time - complex queries can be executed up to an hour</li> </ul> <p><strong>Simplified schema of table, that will be very big:</strong></p> <pre><code>id| integer | not null default nextval('table_id_seq'::regclass) url_id | integer | not null domain_id | integer | not null position | integer | not null </code></pre> <p>The problem with the schema above is that I don't have the exact answer on how to partition it. Data for all periods is going to be used (<b>NO queries will have date filters</b>).</p> <p>I thought about partitioning on "domain_id" field, but the problem is that it is hard to predict how many rows each partition will have.</p> <p>My main question is:</p> <p><strong>Does is make sense to partition data if i don't use partition pruning and i am not going to delete old data?</strong></p> <p><strong>What will be pros/cons of that ?</strong></p> <p><strong>How will degrade my import speed, if i won't do partitioning?</strong></p> <p>Another question related to normalization:</p> <p><strong>Should url be exported to another table?</strong></p> <p><strong>Pros of normalization</strong></p> <ul> <li>Table is going to have rows with average size of 20-30 bytes.</li> <li>Joins on "url_id" are supposed to be much faster than on "url" field</li> </ul> <p><strong>Pros of denormalization</strong></p> <ul> <li>Data can be imported much, much faster, as i don't have to make lookup into "url" table before each insert.</li> </ul> <p><strong>Can anybody give me any advice? Thanks!</strong></p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload