StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POLooking for a better compression technique
primarykey
Id
13349278
data
AcceptedAnswerId
13349469
AnswerCount
5
ClosedDate
CommentCount
2
CommunityOwnedDate
CreationDate
2012-11-12T18:23:46.027
FavoriteCount
4
LastActivityDate
2012-11-12T19:19:21.850
LastEditDate
LastEditorUserId
0
OwnerUserId
0
ParentId
0
PostTypeId
1
Score
8
ViewCount
2892
LastEditorDisplayName
text
Body
I'm compressing a binary stream that's made of packets A packet is composed of 256 32-bit integers (samples). The thing is that most integers change only a few bits from the previous integer (typically 0 - 4 bits change at most from the previous sample in the stream). Here is an example: <pre><code>3322 2222 2222 1111 1111 1110 0000 0000 BIT POSITIONS 1098 7654 3210 9817 6543 2109 8765 4321 -------------------------------------------------------- 1100 1001 1110 0010 0001 0101 0110 1101 Sample 1 * * 1100 1001 1110 1010 0001 0101 0110 0101 Sample 2 changes: bit 19, 4 1100 1001 1110 1010 0001 0101 0110 0101 Sample 3 changes: none * * * 1100 0001 1110 1011 0001 0101 0010 0101 Sample 4 changes: bit 27, 17, 7 ... </code></pre> My current, lossles-compression scheme is based around nibbles. Basically I'm using a control byte where I'm encoding -using single bits- which nibbles changed from the previous sample; If there's a change, I'll include the modified nibbles on the compression stream, otherwise they will be reconstructed from the previous sample upon decompression. Here is how the example stream I provided would be compressed: <pre><code>Control Byte: 11111111 // all nibbles change, since this is first sample Data: 1100 1001 1110 0010 0001 0101 0110 1101 // data for all nibbles Control Byte: 00010001 // only nibbles 3 and 7 have changes Data: 1010 0101 // data for nibbles 3 and 7 Control Byte: 00000000 // no nibbles are changing Data: // no data is required Control Byte: 01010010 // nibbles 1, 3 and 6 have changes Data: 0001 1011 0010 // nibbles 1, 3 and 6 ... </code></pre> Using this scheme, we have a fixed overhead of 256 bytes (the control bytes), with an average, variable compressed-data length of 260 bytes (the nibbles that are changing from sample to sample). Considering the uncompressed packet is 1024 bytes in length, this is practically giving us a 50% average compression rate. This is not bad, but my gut feeling is that a much better approach is possible. Is anybody aware of a better compression strategy that exploits the fact that very few bits change from sample to sample? Lossy compression is an alternative as long as the bit-error rate after decompression is small (less than 3%) - for this particular data stream, the numerical weight of the bit positions is irrelevant, so an error ocurring in the higher bits is of no concern at all. Thanks everyone in advance!
Tags
<algorithm><compression>
Title
Looking for a better compression technique
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. This table or related slice is empty.
UserOwnerUserId
1. This table or related slice is empty.
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
2. PO
 singulars
 PostTypePostTypeId
 PTAnswer
3. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POLooking for a better compression technique
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 POLooking for a better compression technique
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 POLooking for a better compression technique
 UserUserId
 UScmh
 VoteTypeVoteTypeId
 VTFavorite
CommentsPostId
1. COIs the order of samples within the packet important? If not, you could sort within each packet to minimize the number of control bytes.
 singulars
 PostPostId
 POLooking for a better compression technique
 UserUserId
 UScmh
2. CO@cmh, good suggestion - unfortunately the order or the samples is relevant :(
 singulars
 PostPostId
 POLooking for a better compression technique
 UserUserId
 This table or related slice is empty.

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.