Note that there are some explanatory texts on larger screens.

plurals
  1. POSharding when you don't have a good partition function
    primarykey
    data
    text
    <p>Edit: I see that the partition functionality of some RDBMS (Postgres: <a href="http://www.postgresql.org/docs/9.1/static/ddl-partitioning.html" rel="nofollow">http://www.postgresql.org/docs/9.1/static/ddl-partitioning.html</a>) provides much of what I'm looking for. I'd still be interested in specific algorithms and best practices for managing the partitioning.</p> <p>Horizontally scaling relational databases is often achieved by sharding data onto <em>n</em> servers based on some function that splits the data into <em>n</em> buckets of roughly equal expected size. This maintains efficient and useful queries as long as all queries contain the shard key and the data is partitioned into mutually irrelevant sets, respectively.</p> <p>What is the best approach to horizontally scaling a relational database when you <em>don't</em> have any function that fits the above properties?</p> <p>For example, in a multi-tenant situation, some tenants may produce barely any data and some may produce a full server's worth (or more), and there's no way to know which, and almost all of the queries you want to do are on a tenant's entire dataset.</p> <p>I couldn't find much literature on this. The best solution I can think of is:</p> <ul> <li>Initially partition based on some naive equal-splitting function into <em>n</em> groups.</li> <li>When any server gets filled up, increment n (or increase by some other amount/factor), then re-partition the data.</li> <li>When a tenant takes up more than some percent of the space on a server, move it to its own server, and add a special case to the partitioning function.</li> </ul> <p>This is pretty complicated and would require a lot of complex logic in your application sharding layer (not to mention copying large sets of data between servers), but it seems like it wouldn't be too hard to semi-automate and if you were careful you could change the sharding function over time in a way that minimized the amount of data relocation from one server to another.</p> <p>Is this completely barking up the wrong tree?</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload