Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Summary:<br> Git’s pack files are carefully constructed to effectively use disk caches and provide “nice” access patterns for common commands and for reading recently referenced objects.</p> <hr> <p>Git’s pack file format is quite flexible (see <a href="http://git.kernel.org/?p=git/git.git;a=blob;f=Documentation/technical/pack-format.txt;h=1803e64e465fa4f8f0fe520fc0fd95d0c9def5bd;hb=HEAD">Documentation/technical/pack-format.txt</a>, or <a href="http://book.git-scm.com/7_the_packfile.html">The Packfile</a> in <a href="http://book.git-scm.com/index.html">The Git Community Book</a>). The pack files store objects in two main ways: “undeltified” (take the raw object data and deflate-compress it), or “deltified” (form a delta against some other object then deflate-compress the resulting delta data). The objects stored in a pack can be in any order (they do not (necessarily) have to be sorted by object type, object name, or any other attribute) and deltified objects can be made against any other suitable object of the same type.</p> <p>Git’s <a href="http://www.kernel.org/pub/software/scm/git/docs/git-pack-objects.html"><em>pack-objects</em></a> command uses several <a href="http://git.kernel.org/?p=git/git.git;a=blob;f=Documentation/technical/pack-heuristics.txt;h=103eb5d989349c8e7e0147920b2e218caba9daf9;hb=HEAD">heuristics</a> to provide excellent <a href="http://en.wikipedia.org/wiki/Locality_of_reference">locality of reference</a> for common commands. These heuristics control both the selection of base objects for deltified objects and the order of the objects. Each mechanism is mostly independent, but they share some goals.</p> <p>Git does form long chains of delta compressed objects, but the heuristics try to make sure that only “old” objects are at the ends of the long chains. The delta base cache (who’s size is controlled by the <code>core.deltaBaseCacheLimit</code> configuration variable) is automatically used and can greatly reduce the number of “rebuilds” required for commands that need to read a large number of objects (e.g. <code>git log -p</code>).</p> <h1>Delta Compression Heuristic</h1> <p>A typical Git repository stores a very large number of objects, so it can not reasonably compare them all to find the pairs (and chains) that will yield the smallest delta representations.</p> <p>The delta base selection heuristic is based on the idea that the good delta bases will be found among objects with similar filenames and sizes. Each type of object is processed separately (i.e. an object of one type will never be used as the delta base for an object of another type).</p> <p>For the purposes of delta base selection, the objects are sorted (primarily) by filename and then size. A window into this sorted list is used to limit the number of objects that are considered as potential delta bases. If a “good enough”<sup>1</sup> delta representation is not found for an object among the objects in its window, then the object will not be delta compressed.</p> <p>The size of the window is controlled by the <code>--window=</code> option of <code>git pack-objects</code>, or the <code>pack.window</code> configuration variable. The maximum depth of a delta chain is controlled by the <code>--depth=</code> option of <code>git pack-objects</code>, or the <code>pack.depth</code> configuration variable. The <code>--aggressive</code> option of <code>git gc</code> greatly enlarges both the window size and the maximum depth to attempt to create a smaller pack file.</p> <p>The filename sort clumps together the objects for entries with with identical names (or at least similar endings (e.g. <code>.c</code>)). The size sort is from largest to smallest so that deltas that remove data are preferred to deltas that add data (since removal deltas have shorter representations) and so that the earlier, larger objects (usually newer) tend to be represented with plain compression.</p> <p><sup>1</sup> What qualifies as “good enough” depends on the size of the object in question and its potential delta base as well as how deep its resulting delta chain would be.</p> <h1>Object Ordering Heuristic</h1> <p>Objects are stored in the pack files in a “most recently referenced” order. The objects needed to reconstruct the most recent history are placed earlier in the pack and they will be close together. This usually works well for OS disk caches.</p> <p>All the commit objects are sorted by commit date (most recent first) and stored together. This placement and ordering optimizes the disk accesses needed to walk the history graph and extract basic commit information (e.g. <code>git log</code>).</p> <p>The tree and blob objects are stored starting with the tree from the first stored (most recent) commit. Each tree is processed in a depth first fashion, storing any objects that have not already been stored. This puts all the trees and blobs required to reconstruct the most recent commit together in one place. Any trees and blobs that have not yet been saved but that are required for later commits are stored next, in the sorted commit order.</p> <p>The final object ordering is slightly affected by the delta base selection in that if an object is selected for delta representation and its base object has not been stored yet, then its base object is stored immediately before the deltified object itself. This prevents likely disk cache misses due to the non-linear access required to read a base object that would have “naturally” been stored later in the pack file.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload