Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p><strike>One thing that is definitely hurting you in your C++ code is that it has a boatload of char to int conversions. By boatload, I mean up to 2*2782*4000*128 char to int conversions. Those <code>char</code> to <code>int</code> conversions are slow, very slow.</p> <p>You can reduce this to (2782+4000)*128 such conversions by allocating a pair of <code>int</code> arrays, one 2782*128 and the other 4000*128, to contain the cast-to-integer contents of your <code>char* a</code> and <code>char* b</code> arrays. Work with these <code>int*</code> arrays rather than your <code>char*</code> arrays.</p> <p>Another problem might be your use of <code>int</code> versus <code>long</code>. I don't work on windows, so this might not be applicable. On the machines I work on, <code>int</code> is 32 bits and <code>long</code> is now 64 bits. 32 bits is more than enough because 255*255*128 &lt; 256*256*128 = 2<sup>23</sup>.</strike></p> <p><strong>That obviously isn't the problem.</strong></p> <p>What's striking is that the code in question is not calculating that huge 2728 by 4000 array that the Matlab code is creating. What's even more striking is that Matlab is most likely doing this with doubles rather than ints -- and it's still beating the pants off the C/C++ code.</p> <p>One big problem is cache. That 4000*128 array is far too big for level 1 cache, and you are iterating over that big array 2782 times. Your code is doing far too much waiting on memory. To overcome this problem, work with smaller chunks of the <code>b</code> array so that your code works with level 1 cache for as long as possible.</p> <p>Another problem is the optimization <code>if (distance&gt;min_distance) break;</code>. I suspect that this is actually a dis-optimization. Having <code>if</code> tests inside your innermost loop is oftentimes a bad idea. Blast through that inner product as fast as possible. Other than wasted computations, there is no harm in getting rid of this test. Sometimes it is better to make apparently unneeded computations if doing so can remove a branch in an innermost loop. This is one of those cases. <strong>You might be able to solve your problem just by eliminating this test.</strong> Try doing that.</p> <p>Getting back to the cache problem, you need to get rid of this branch so that you can split the operations over the <code>a</code> and <code>b</code> matrix into smaller chunks, chunks of no more than 256 rows at a time. That's how many rows of 128 unsigned chars fit into one of the two modern Intel chip's L1 caches. Since 250 divides 4000, look into logically splitting that <code>b</code> matrix into 16 chunks. You may well want to form that big 2872 by 4000 array of inner products, but do so in small chunks. You can add that <code>if (distance&gt;min_distance) break;</code> back in, but do so at a chunk level rather than at the byte by byte level.</p> <p>You should be able to beat Matlab because it almost certainly is working with doubles, but you can work with unsigned chars and ints.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload