StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>I see several options for your array compaction.</p> <h1>1. Separate 8-bit and 1-bit arrays</h1> <p>You can split your array into 2 parts: first one stores 8 low-order bits of your original array, second one stores '1' if value does not fit in 8 bits or '0' otherwise. This will take 9 bits per value (same space as in nightcracker's approach, but a little bit simpler). To read value from these two arrays, do the following:</p> <pre><code>int8_t array8[37*73] = {...}; uint16_t array1[(37*73+15)/16] = {...}; size_t offset = 37 * x + y; int16_t item = static_cast<int16_t>(array8[offset]); // sign extend int16_t overflow = ((array1[offset/16] >> (offset%16)) & 0x0001) << 7; item ^= overflow; </code></pre> <h1>2. Approximation</h1> <p>If you can approximate your array with some efficiently computed function (like polynomial or exponent), you can store in the array only the difference between your value and the approximation. This may require only 8 bits per value or even less.</p> <h1>3. Delta encoding</h1> <p>If your data is smooth enough, in addition to applying either of previous methods, you can store a shorter table with only part of the data values and other table, containing only differences between all values, absent in the first table, and values from the first table. This requires less bits for each value.</p> <p>For example, you can store every fifth value and differences for other values:</p> <pre><code> Original array: 0 0 1 1 2 2 2 2 2 3 3 3 4 4 5 5 5 5 5 6 6 6 6 6 6 6 6 7 7 7 Short array: 0 2 3 5 6 6 Difference array: 0 1 1 2 0 0 0 1 0 1 1 2 0 0 0 1 0 0 0 0 0 1 1 1 </code></pre> <p>Alternatively, you can use differences from previous value, which requires even less bits per value:</p> <pre><code> Original array: 0 0 1 1 2 2 2 2 2 3 3 3 4 4 5 5 5 5 5 6 6 6 6 6 6 6 6 7 7 7 Short array: 0 2 3 5 6 6 Delta array: 0 1 0 1 0 0 0 1 0 1 0 1 0 0 0 1 0 0 0 0 0 1 0 0 </code></pre> <p>Approach with delta array may be efficiently implemented using bitwise operations if a group of delta values fits exactly in int16_t.</p> <hr> <p><strong>Initialization</strong></p> <p>For option #2, preprocessor may be used. For other options, preprocessor is possible, but may be not very convenient (preprocessor is not very good to process long value lists). Some combination of preprocessor and variadic templates may be better. Or it may be easier to use some text-processing script.</p> <hr> <p><strong>Update</strong></p> <p>After looking at the actual data, I can tell some more details. Option #2 (Approximation) is not very convenient for your data. Option #1 seems to be better. Or you can use Mark Ransom's or nightcracker's approach. It doesn't matter, which one - in all cases you save 7 bits out of 16.</p> <p>Option #3 (Delta encoding) allows to save much more space. It cannot be used directly, because in some cells of the array data changes abruptly. But, as far as I know, these large changes happen at most once for each row. Which may be implemented by one additional column with full data value and one special value in the delta array.</p> <p>I noticed, that (ignoring these abrupt changes) difference between neighbor values is never more than +/- 32. This requires 6 bits to encode each delta value. This means 6.6 bits per value. 58% compression. About 2400 bytes. (Not much, but a little bit better than 2464K in your comments).</p> <p>Middle part of the array is much more smooth. You'll need only 5 bits per value to encode it separately. This may save 300..400 bytes more. Probably it's a good idea to split this array into several parts and encode each part differently.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload