Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>If this were something like a very large data warehouse fact table with a time component on it by which the data had to be efficiently queried (say, DATE_OF_SALE), then a common implementation would be a relational database table that is partitioned on that value.</p> <p>In Oracle this would typically be range partitioning, so I'll address how that is implemented internally.</p> <p>A regular unpartitioned table can be considered to be a set of column and table metadata (table name, column names and data types etc) and a "physical" data segment that stores the actual data. A full table scan requires that this data segment be read for every block under the High Water Mark.</p> <p>Partitioning breaks the table into multiple segments, each of which is logically constrained to hold a particular set of data. That could be a set defined by a list of values for a particular column (the partitioning key), of the result of a hash function applied to a column, or in this case a range of values of a column.</p> <p>The query optimiser detects the presence of a predicate on a partition key column, and attempts to isolate the minimum set of partitions which might contain candidate data. These can then be scanned, or accessed via indexes dedicated to each partition. This is known as Partition Pruning, and results in much faster scans of the data due to the elimination of large data sets from consideration.</p> <p>In more engineered systems, such as Oracle's Exadata, there can be structures that store the maximum and minimum values of columns for sets of contiguous data blocks, sized in the low megabytes range. In this case a full scan of a table or partition can eliminate scans of these sets of data blocks by eliminating the possibility that candidate rows exist in them. Oracle calls these structures Storage Indexes.</p> <p>So, apologies for the Oracle-heavy approach, but similar implementations exist in other relational and non-relational databases, and they can offer much greater performance than indexes.</p> <p>One issue with indexes, by the way, is that there is no implicit organisation of the table's data, so an index scan of 20% of the table data is quite possibly going to be less efficient than a full scan of the data due to repeated single block access of the table's data segment. Some RDBMS's allow the physical order of the rows to be set -- PostgreSQL allows clustering of the table by an index's columns, which makes a one-off rewrite of the table in the order of the index, which improves index-based access until the data becomes disorganised due to the addition of new rows or update of existing rows.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload