Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Based on Floris's fast function I tried to see if I could find a way to find a faster solution with OpenMP. I came up with two functions <code>foo_v2</code> and <code>foo_v3</code> which are faster for larger arrays, <code>foo_v2</code> is faster independent of density and <code>foo_v3</code> is faster for sparser arrays. The function <code>foo_v2</code> essentially creates a 2D array with width <code>N*nthreads</code> as well as an array <code>countsa</code> which contains the counts for each thread. This is better explained with code. The following code would loop over all the elements written out to T.</p> <pre><code>for(int ithread=0; ithread&lt;nthreads; ithread++) { for(int i=0; i&lt;counta[ithread]; i++) { T[ithread*N/nthread + i] } } </code></pre> <p>The function<code>foo_v3</code>creates a 1D array as requested. In all cases<code>N</code>has to be pretty large to overcome the OpenMP overhead. The code below defaults to 256MB with a density of<code>M</code>about 10%. The OpenMP functions are both faster by over a factor of 2 on my 4 core Sandy Bridge system. If you put the density at 50%<code>foo_v2</code>is faster still by about about a factor of 2 but<code>foo_v3</code>is no longer faster. </p> <pre><code>#include &lt;stdio.h&gt; #include &lt;stdlib.h&gt; #include &lt;omp.h&gt; int foo_v1(int *M, int *T, const int N) { int count = 0; for(int i = 0; i&lt;N; i++) { T[count] = i; count += M[i]; } return count; } int foo_v2(int *M, int *T, int *&amp;counta, const int N) { int nthreads; #pragma omp parallel { nthreads = omp_get_num_threads(); const int ithread = omp_get_thread_num(); #pragma omp single counta = new int[nthreads]; int count_private = 0; #pragma omp for for(int i = 0; i&lt;N; i++) { T[ithread*N/nthreads + count_private] = i; count_private += M[i]; } counta[ithread] = count_private; } return nthreads; } int foo_v3(int *M, int *T, const int N) { int count = 0; int *counta = 0; #pragma omp parallel reduction(+:count) { const int nthreads = omp_get_num_threads(); const int ithread = omp_get_thread_num(); #pragma omp single { counta = new int[nthreads+1]; counta[0] = 0; } int *Tprivate = new int[N/nthreads]; int count_private = 0; #pragma omp for nowait for(int i = 0; i&lt;N; i++) { Tprivate[count_private] = i; count_private += M[i]; } counta[ithread+1] = count_private; count += count_private; #pragma omp barrier int offset = 0; for(int i=0; i&lt;(ithread+1); i++) { offset += counta[i]; } for(int i=0; i&lt;count_private; i++) { T[offset + i] = Tprivate[i]; } delete[] Tprivate; } delete[] counta; return count; } void compare(const int *T1, const int *T2, const int N, const int count, const int *counta, const int nthreads) { int diff = 0; int n = 0; for(int ithread=0; ithread&lt;nthreads; ithread++) { for(int i=0; i&lt;counta[ithread]; i++) { int i2 = N*ithread/nthreads+i; //printf("%d %d\n", T1[n], T2[i2]); int tmp = T1[n++] - T2[i2]; if(tmp&lt;0) tmp*=-1; diff += tmp; } } printf("diff %d\n", diff); } void compare_v2(const int *T1, const int *T2, const int count) { int diff = 0; int n = 0; for(int i=0; i&lt;count; i++) { int tmp = T1[i] - T2[i]; //if(tmp!=0) printf("%i %d %d\n", i, T1[i], T2[i]); if(tmp&lt;0) tmp*=-1; diff += tmp; } printf("diff %d\n", diff); } int main() { const int N = 1 &lt;&lt; 26; printf("%f MB\n", 4.0*N/1024/1024); int *M = new int[N]; int *T1 = new int[N]; int *T2 = new int[N]; int *T3 = new int[N]; int *counta; double dtime; for(int i=0; i&lt;N; i++) { M[i] = ((rand()%10)==0); } //int repeat = 10000; int repeat = 1; int count1, count2; int nthreads; dtime = omp_get_wtime(); for(int i=0; i&lt;repeat; i++) count1 = foo_v1(M, T1, N); dtime = omp_get_wtime() - dtime; printf("time v1 %f\n", dtime); dtime = omp_get_wtime(); for(int i=0; i&lt;repeat; i++) nthreads = foo_v2(M, T2, counta, N); dtime = omp_get_wtime() - dtime; printf("time v2 %f\n", dtime); compare(T1, T2, N, count1, counta, nthreads); dtime = omp_get_wtime(); for(int i=0; i&lt;repeat; i++) count2 = foo_v3(M, T3, N); dtime = omp_get_wtime() - dtime; printf("time v2 %f\n", dtime); printf("count1 %d, count2 %d\n", count1, count2); compare_v2(T1, T3, count1); } </code></pre>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload