Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>So you can save 8 different values in 3 bits, that's right. You can therefore store 2x3 bits in a byte, or 8x3 bits in 24/8=3 bytes, while storing them as characters would only mean 3 characters 'A' - 'H' in 24 bits. </p> <p>That's a saving of 8/3, so 1.5 billion bytes would do. However, to make a short example Assuming you had to store a sequence which isn't dividable by 8, then you would have some unused bytes in you code: </p> <p>You save BA, which is 001 000, and have to fill the byte to 0010:0000. How would you distinguish it from B? B is just 001, so filled up, it is 0010:0000 too. </p> <p>Ok. For a file of filelength 1.5 GB, you could just always append a single byte, which tells you, how many bits of the last byte are to be used. In the example above, you would once append 6, once append 3. </p> <p>But now, if you have to insert something. You shift through the binary sequence always 3 bits, but if there is a non-8-disible insertion, you can't just read the following bytes, and append them to your bitstream, but you would have to transpose every following byte: cut it into 2 peaces, append the first part to your overrun, and keep the second part as overrun for the next byte. </p> <p>The implementation might not be too tricky - but I don't know how the runtime would be affected. </p> <p>Maybe a statistical analysis can help. How often are characters appended, and how much. How often inserted at what size? </p> <p>Maybe it would be more easy to organze the file in chunks - maybe 1000 files of 2MB, each containing a free buffer to append to. The last bytes could specify how many bytes are considered content in the file. </p> <p>How is the insertion of data specified? Will it be "insert at position 2 713 345 947 the sequence "AHA"? Or will it be "insert after the 3rd sequence of "FACHDAG" "BACH"? Are there typical, often repeated sequences like words in natural language? </p> <p>In the firsts case, an external index could be very useful. If you could look up, in which of 1000 files the position 2,713,345,947 is, and skipping on average the necessity to read 50% of your 1.5G could improve your speed much. </p> <p>But you would need statistical analysis: Will the file grow or more or less stay the same size. How often is it read and written. Are updates inserts, appends, deletions. </p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload