Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Getting an additional factor of 2 out of this shouldn't be hard if you just write the code directly. Part of it might come from removing some of the generality of inner_product, but some would also come from such things as eliminating the use of deques - if you just keep a pointer into your input array you can index off it and off the filter array in the inner loop, and increment the pointer to the input array in the outer loop.</p> <p>Each of those inner_products has to use iterators through deques, </p> <p>Most of the (coding) effort then becomes handling the edge conditions.</p> <p>And take that division out of there - it should be a multiplication by a constant calculated outside the loop.</p> <p>Inner product itself is pretty efficient (there's not much to do there), but it needs to increment two iterators on each pass through the inner loop. There is no explicit loop unrolling, but a good compiler can unroll a loop that simple. And a compiler is more likely to know how far to unroll a loop before running into instruction cache issues. </p> <p>Deque iterators are not nearly as efficient as ++ on a pure pointer. There is at least a test on every ++, and there may be more than one assignment.</p> <p>This is what a simple (FIR) filter can look like, without including the code for the edge conditions (which goes outside of the loop)</p> <pre><code>double norm = 1.0/sum; double *p = data.values(); // start of input data double *q = output.values(); // start of output buffer int width = data.size() - filter.size(); for( int i = 0; i &lt; width; ++i ) { double *f = filter.values(); double accumulator = ( f[0] * p[0] ); for( int j = 1; j &lt; filter.size(); ++j ) { accumulator += ( f[i] * p[i] ); } *q++ = accumulator * norm; } </code></pre> <p>Note that there are messy details left out, and this is not the same as your filter, but it gives the idea. What's inside the outer loop easily fits in a modern instruction cache. The inner loop may be unrolled by the compiler. Most modern architectures can do the add and multiply in parallel. </p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload