Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p><code>SequenceFile</code> is a key/value pair file format implemented in Hadoop. Even though <code>SequenceFile</code> is used in HBase for storing write-ahead logs, <code>SequenceFile</code>'s block compression implementation is not.</p> <p>The <code>Compression</code> class is part of Hadoop's compression framework and as such is used in HBase's HFile block compression.</p> <p>HBase already has built-in compression of the following types:</p> <ul> <li>HFile block compression on disk. This uses Hadoop's codec framework and supports compression algorithms such as LZO, GZIP, and SNAPPY. This type of compression is only applied to HFile blocks that are stored on disk, because the whole block needs to be uncompressed to retrieve key/value pairs.</li> <li>In-cache key compression (called "data block encoding" in HBase terminology)—see <a href="https://issues.apache.org/jira/browse/HBASE-4218" rel="nofollow">HBASE-4218</a>. Implemented encoding algorithms include various types of prefix and delta encoding, and trie encoding is being implemented as of this writing (<a href="https://issues.apache.org/jira/browse/HBASE-4676" rel="nofollow">HBASE-4676</a>). Data block encoding algorithms take advantage of the redundancy between sorted keys in an HFile block and only store the differences between consecutive keys. These algorithms currently do not deal with values, and therefore are mostly useful for the case of small values (relative to key size), e.g. counters. Due to the light-weight nature of these data block encoding algorithms, it is possible to efficiently decode only the necessary part of the block to retrieve the requested key or advance to the next key. This is why these encoding algorithms are good for improving cache efficiency. However, on some real-world datasets delta encoding also allows to save up to 50% on top of LZO compression (e.g. applying delta encoding and then LZO vs. LZO only), thus achieving significant savings on disk as well.</li> <li>A custom dictionary-based write-ahead log compression approach is implemented in <a href="https://issues.apache.org/jira/browse/HBASE-4608" rel="nofollow">HBASE-4608</a>. Note: even though SequenceFile is used for write-ahead log storage in HBase, <code>SequenceFile</code>'s built-in block compression cannot be used for write-ahead log, because buffering key/value pairs for block compression would cause data loss.</li> </ul> <p>HBase RPC compression is a work in progress. As you mentioned, compressing key/value pairs passed between client and HBase can save bandwidth and improve HBase performance. This has been implemented in Facebook's version of HBase, 0.89-fb (<a href="https://issues.apache.org/jira/browse/HBASE-5355" rel="nofollow">HBASE-5355</a>) but has yet to be ported to the official Apache HBase trunk. RPC compression algorithms supported in HBase 0.89-fb are the same as those supported by the Hadoop compression framework (e.g. GZIP and LZO).</p> <p>The <code>setCompressedMapOutput</code> method is a map-reduce configuration method and is not really relevant to HBase compression.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload