Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>The first thing you want to do is test any possible solution against your current code.</p> <p>I think you may want to try and get rid of data dependencies, to allow the processor to do more work at the same time.</p> <p><strong>What are data dependencies?</strong> As data flows through your function, the current value of <code>n</code> depends on the previous value of <code>n</code>, which depends on the value before that... which is a long chain of data dependencies. In the code below, <code>n</code> is never modified so the processor can "skip ahead" and do a couple different things at the same time without having to wait for the new <code>n</code> to be computed.</p> <pre><code>// NOTE: This code is actually incorrect, as caf noted. // The byte order is reversed. size_t compress_unsigned_int(unsigned int n, char *data) { if (n &lt; (1U &lt;&lt; 14)) { if (n &lt; (1U &lt;&lt; 7)) { data[0] = n; return 1; } else { data[0] = (n &gt;&gt; 7) | 0x80; data[1] = n &amp; 0x7f; return 2; } } else if (n &lt; (1U &lt;&lt; 28)) { if (n &lt; (1U &lt;&lt; 21)) { data[0] = (n &gt;&gt; 14) | 0x80; data[1] = ((n &gt;&gt; 7) &amp; 0x7f) | 0x80; data[2] = n &amp; 0x7f; return 3; } else { data[0] = (n &gt;&gt; 21) | 0x80; data[1] = ((n &gt;&gt; 14) &amp; 0x7f) | 0x80; data[2] = ((n &gt;&gt; 7) &amp; 0x7f) | 0x80; data[3] = n &amp; 0x7f; return 4; } } else { data[0] = (n &gt;&gt; 28) | 0x80; data[1] = ((n &gt;&gt; 21) &amp; 0x7f) | 0x80; data[2] = ((n &gt;&gt; 14) &amp; 0x7f) | 0x80; data[3] = ((n &gt;&gt; 7) &amp; 0x7f) | 0x80; data[4] = n &amp; 0x7f; return 5; } } </code></pre> <p>I tested the performance by executing it in a tight loop from 0..UINT_MAX. On my system, the execution times are:</p> <pre><code>(Lower is better) Original: 100% caf's unrolled version: 79% My version: 57% </code></pre> <p>Some minor tweaking may produce better results, but I doubt you'll get much more improvement unless you go to assembly. If your integers tend to be in specific ranges, then you can use profiling to get the compiler to add the right branch predictions to each branch. This might get you a few extra percentage points of speed. (<strong>EDIT:</strong> I got 8% from reordering the branches, but it's a perverse optimization because it relies on the fact that each number 0...UINT_MAX appears with equal frequency. I don't recommend this.)</p> <p>SSE won't help. SSE is designed to operate on multiple pieces of data with the same width at the same time, it is notoriously difficult to get SIMD to accelerate anything with a variable length encoding. (It's not necessarily impossible, but it might be impossible, and you'd have to be pretty smart to figure it out.)</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload