Note that there are some explanatory texts on larger screens.

plurals
  1. POSparse data: efficient storage and retrieval in an RDBMS
    text
    copied!<p>I have a table representing values of source file metrics across project revisions, like the following:</p> <pre><code>Revision FileA FileB FileC FileD FileE ... 1 45 3 12 123 124 2 45 3 12 123 124 3 45 3 12 123 124 4 48 3 12 123 124 5 48 3 12 123 124 6 48 3 12 123 124 7 48 15 12 123 124 </code></pre> <p>(The relational view of the above data is different. Each row contains the following columns: Revision, FileId, Value. The files and their revisions from which the data is calculated are stored in Subversion repositories, so we're trying to represent the repository's structure in a relational schema.) </p> <p>There can be up to 23750 files in 10000 revisions (this is the case for the <a href="http://en.wikipedia.org/wiki/ImageMagick" rel="nofollow noreferrer">ImageMagick</a> drawing program). As you can see, most values are the same between successive revisions, so the table's useful data is quite sparse. I am looking for a way to store the data that </p> <ul> <li>avoids replication and uses space efficiently (currently the non-sparse representation requires 260 GB (data+index) for less than 10% of the data I want to store)</li> <li>allows me to retrieve efficiently the values for a specific revision using an SQL query (without explicitly looping through revisions or files) </li> <li>allows me to retrieve efficiently the revision for a specific metric value.</li> </ul> <p>Ideally, the solution should not depend on a particular <a href="http://en.wikipedia.org/wiki/Relational_database_management_system" rel="nofollow noreferrer">RDBMS</a> and should be compatible with <a href="http://en.wikipedia.org/wiki/Hibernate_%28Java%29" rel="nofollow noreferrer">Hibernate</a>. If this is not possible, I can live with using Hibernate, MySQL or PostgreSQL-specific features.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload