Note that there are some explanatory texts on larger screens.

plurals
  1. POFill histograms (array reduction) in parallel with OpenMP without using a critical section
    primarykey
    data
    text
    <p>I would like to fill histograms in parallel using OpenMP. I have come up with two different methods of doing this with OpenMP in C/C++. </p> <p>The first method <code>proccess_data_v1</code> makes a private histogram variable <code>hist_private</code> for each thread, fills them in prallel, and then sums the private histograms into the shared histogram <code>hist</code> in a <code>critical</code> section. </p> <p>The second method <code>proccess_data_v2</code> makes a shared array of histograms with array size equal to the number of threads, fills this array in parallel, and then sums the shared histogram <code>hist</code> in parallel. </p> <p>The second method seems superior to me since it avoids a critical section and sums the histograms in parallel. However, it requires knowing the number of threads and calling <code>omp_get_thread_num()</code>. I generally try to avoid this. Is there better way to do the second method without referencing the thread numbers and using a shared array with size equal to the number of threads?</p> <pre><code>void proccess_data_v1(float *data, int *hist, const int n, const int nbins, float max) { #pragma omp parallel { int *hist_private = new int[nbins]; for(int i=0; i&lt;nbins; i++) hist_private[i] = 0; #pragma omp for nowait for(int i=0; i&lt;n; i++) { float x = reconstruct_data(data[i]); fill_hist(hist_private, nbins, max, x); } #pragma omp critical { for(int i=0; i&lt;nbins; i++) { hist[i] += hist_private[i]; } } delete[] hist_private; } } void proccess_data_v2(float *data, int *hist, const int n, const int nbins, float max) { const int nthreads = 8; omp_set_num_threads(nthreads); int *hista = new int[nbins*nthreads]; #pragma omp parallel { const int ithread = omp_get_thread_num(); for(int i=0; i&lt;nbins; i++) hista[nbins*ithread+i] = 0; #pragma omp for for(int i=0; i&lt;n; i++) { float x = reconstruct_data(data[i]); fill_hist(&amp;hista[nbins*ithread], nbins, max, x); } #pragma omp for for(int i=0; i&lt;nbins; i++) { for(int t=0; t&lt;nthreads; t++) { hist[i] += hista[nbins*t + i]; } } } delete[] hista; } </code></pre> <p><strong>Edit:</strong> Based on a suggestion by @HristoIliev I have created an improved method called <code>process_data_v3</code></p> <pre><code>#define ROUND_DOWN(x, s) ((x) &amp; ~((s)-1)) void proccess_data_v2(float *data, int *hist, const int n, const int nbins, float max) { int* hista; #pragma omp parallel { const int nthreads = omp_get_num_threads(); const int ithread = omp_get_thread_num(); int lda = ROUND_DOWN(nbins+1023, 1024); //1024 ints = 4096 bytes -&gt; round to a multiple of page size #pragma omp single hista = (int*)_mm_malloc(lda*sizeof(int)*nthreads, 4096); //align memory to page size for(int i=0; i&lt;nbins; i++) hista[lda*ithread+i] = 0; #pragma omp for for(int i=0; i&lt;n; i++) { float x = reconstruct_data(data[i]); fill_hist(&amp;hista[lda*ithread], nbins, max, x); } #pragma omp for for(int i=0; i&lt;nbins; i++) { for(int t=0; t&lt;nthreads; t++) { hist[i] += hista[lda*t + i]; } } } _mm_free(hista); } </code></pre>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload