Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>To expand on what Dirk Eddelbuettel has already said, it is next to impossible to both generate the same PRN sequence in parallel <em>and</em> have the desired speed-up. The root of this is that generation of PRN sequences is essentially a sequential process where each state depends on the previous one and this creates a backward dependence chain that reaches back as far as the initial seeding state.</p> <p>There are two basic solutions to this problem. One of them requires a lot of memory and the other one requires a lot of CPU time and both are actually more like workarounds than true solutions:</p> <p><strong>pregenerated PRN sequence:</strong> One thread generates sequentially a huge array of PRNs and then all threads access this array in a manner that would be consistent with the sequential case. This method requires lots of memory in order to store the sequence. Another option would be to have the sequence stored into a disk file that is later memory-mapped. The latter method has the advantage that it saves some compute time, but generally I/O operations are slow, so it only makes sense on machines with limited processing power or with small amounts of RAM.</p> <p><strong>prewound PRNGs:</strong> This one works well in cases when work is being statically distributed among the threads, e.g. with <code>schedule(static)</code>. Each thread has its own PRNG and all PRNGs are seeded with the same initial seed. Then each thread draws as many dummy PRNs as its starting iteration, essentially prewinding its PRNG to the correct position. For example:</p> <ul> <li>thread 0: draws 0 dummy PRNs, then draws 100 PRNs and fills <code>out(0:99)</code></li> <li>thread 1: draws 100 dummy PRNs, then draws 100 PRNs and fills <code>out(100:199)</code></li> <li>thread 2: draws 200 dummy PRNs, then draws 100 PRNs and fills <code>out(200:299)</code></li> </ul> <p>and so on. This method works well when each thread does a lot of computations besides drawing the PRNs since the time to prewind the PRNG could be substantial in some cases (e.g. with many iterations).</p> <p>A third option exists for the case when there is a lot of data processing besides drawing a PRN. This one uses OpenMP ordered loops (note that the iteration chunk size is set to 1):</p> <pre class="lang-cpp prettyprint-override"><code>#pragma omp parallel for ordered schedule(static,1) for (int i=0; i &lt; n; i++) { #pragma omp ordered { rnum = R::rnorm(mu,sigma); } out(i) = lots of processing on rnum } </code></pre> <p>Although loop ordering essentially serialises the computation, it still allows for <code>lots of processing on rnum</code> to execute in parallel and hence parallel speed-up would be observed. See <a href="https://stackoverflow.com/a/13230816/1374437">this answer</a> for a better explanation as to why so.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload