StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>Numerical process simulations are typically run over a single discretised problem grid (for example, the <a href="http://www.metoffice.gov.uk/science/creating/daysahead/nwp/um.html" rel="nofollow noreferrer">surface of the Earth</a> or <a href="http://www.mpa-garching.mpg.de/gadget/" rel="nofollow noreferrer">clouds of gas and dust</a>), which usually rules out <a href="http://en.wikipedia.org/wiki/Embarrassingly_parallel" rel="nofollow noreferrer">simple task farming</a> or concurrency approaches. This is because a grid divided over a set of processors representing an area of physical space is not a set of independent tasks. The grid cells at the edge of each subgrid need to be updated based on the values of grid cells stored on other processors, which are adjacent in logical space.</p> <p>In <a href="http://en.wikipedia.org/wiki/High-performance_computing" rel="nofollow noreferrer">high-performance computing</a>, simulations are typically <a href="http://en.wikipedia.org/wiki/Parallel_computing" rel="nofollow noreferrer">parallelised</a> using either <a href="http://en.wikipedia.org/wiki/Message_Passing_Interface" rel="nofollow noreferrer">MPI</a> or <a href="http://en.wikipedia.org/wiki/OpenMP" rel="nofollow noreferrer">OpenMP</a>. MPI is a message passing library with bindings for many languages, including <a href="http://www.lam-mpi.org/tutorials/bindings/" rel="nofollow noreferrer">C, C++, Fortran</a>, <a href="http://mpi4py.scipy.org/" rel="nofollow noreferrer">Python</a>, and <a href="http://osl.iu.edu/research/mpi.net/" rel="nofollow noreferrer">C#</a>. OpenMP is an API for shared-memory multiprocessing. In general, MPI is more difficult to code than OpenMP, and is much more invasive, but is also much more flexible. OpenMP requires a memory area shared between processors, so is not suited to many architectures. <a href="http://mc.stanford.edu/cgi-bin/images/6/60/Rabenseifner_hybrid_03.pdf" rel="nofollow noreferrer">Hybrid schemes</a> are also possible.</p> <p>This type of programming has its own special challenges. As well as <a href="http://en.wikipedia.org/wiki/Race_condition" rel="nofollow noreferrer">race conditions</a>, <a href="http://en.wikipedia.org/wiki/Deadlock" rel="nofollow noreferrer">deadlocks</a>, <a href="http://en.wikipedia.org/wiki/Deadlock#Livelock" rel="nofollow noreferrer">livelocks</a>, and all the other joys of <a href="http://en.wikipedia.org/wiki/Concurrent_computing" rel="nofollow noreferrer">concurrent programming</a>, you need to consider the <a href="https://computing.llnl.gov/tutorials/mpi/#Virtual_Topologies" rel="nofollow noreferrer">topology</a> of your processor grid - how you choose to split your logical grid across your physical processors. This is important because your parallel <a href="http://en.wikipedia.org/wiki/Speedup" rel="nofollow noreferrer">speedup</a> is a function of the amount of communication between your processors, which itself is a function of the total edge length of your decomposed grid. As you add more processors, this surface area increases, increasing the amount of <a href="http://en.wikipedia.org/wiki/Parallel_slowdown" rel="nofollow noreferrer">communication overhead</a>. Increasing the <a href="http://en.wikipedia.org/wiki/Granularity#In_computing" rel="nofollow noreferrer">granularity</a> will eventually become prohibitive.</p> <p>The other important consideration is the proportion of the code which can be parallelised. <a href="http://en.wikipedia.org/wiki/Amdahl%27s_law" rel="nofollow noreferrer">Amdahl's law</a> then dictates the maximum theoretically attainable speedup. You should be able to estimate this before you start writing any code.</p> <p>Both of these facts will conspire to limit the maximum number of processors you can run on. The sweet spot may be considerably lower than you think.</p> <p>I recommend the book <a href="http://oreilly.com/catalog/9781565923126" rel="nofollow noreferrer">High Performance Computing</a>, if you can get hold of it. In particular, the chapter on performance benchmarking and tuning is priceless.</p> <p>An excellent online overview of parallel computing, which covers the major issues, is this introduction from <a href="https://computing.llnl.gov/tutorials/parallel_comp/" rel="nofollow noreferrer">Lawerence Livermore National Laboratory</a>.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload