Note that there are some explanatory texts on larger screens.

plurals
  1. POSSE instruction within nested for loops
    primarykey
    data
    text
    <p>i have several nested for loops in my code and i try to use intel SSE instructions on an intel i7 core to speed up the application. The code structure is as follows (val is set in a higher for loop):</p> <pre><code>_m128 in1,in2,tmp1,tmp2,out; float arr[4] __attribute__ ((aligned(16))); val = ...; ... several higher for loops ... for(f=0; f&lt;=fend; f=f+4){ index2 = ...; for(i=0; i&lt;iend; i++){ for(j=0; j&lt;jend; j++){ inputval = ...; index = ...; if(f&lt;fend-4){ arr[0] = array[index]; arr[1] = array[index+val]; arr[2] = array[index+2*val]; arr[3] = array[index+3*val]; in1 = _mm_load_ps(arr); in2 = _mm_set_ps1(inputval); tmp1 = _mm_mul_ps(in1, in2); tmp2 = _mm_loadu_ps(&amp;array2[index2]); out = _mm_add_ps(tmp1,tmp2); _mm_storeu_ps(&amp;array2[index2], out); } else { //if no 4 values available for SSE instruction execution execute serial code for(int u = 0; u &lt; fend-f; u++ ) array2[index2+u] += array[index+u*val] * inputval; } } } } </code></pre> <p>I think there are two main problems: the buffer used for aligning the values from 'array', and the fact that when no 4 values are left (e.g. when fend = 6, two values are left over which should be executed with the sequential code). Is there any other way of loading the values from in1 and/or executing SSE intructions with 3 or 2 values? </p> <hr> <p>Thanks for the answers so far. The loading is as good as it gets i think, but is there any workaround for the 'leftover' part within the else statement that could be solved using SSE instructions?</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload