Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Systems with benchmark performance of 100s of teraflops are usually considered to be supercomputers. A typical feature of these supercomputers is that they have a large number of computing nodes, typically in the range of <code>O(10^3)</code> to <code>O(10^6)</code>). This distinguishes them from small-to-midsize computing clusters, which usually have <code>O(10)</code> to <code>O(10^2)</code> nodes.</p> <p>When writing software that aims to make effective use of these resources, a number of challenges arise that are usually not present when working on single-core systems or even small clusters:</p> <h3>Higher degree of parallelization required</h3> <p>According to <a href="http://en.wikipedia.org/wiki/Amdahl%27s_law" rel="nofollow">Amdahl's Law</a>, the maximum speedup one can achieve using parallel computers is restricted by the fraction of serial processes in your code (i.e. parts that can not be parallelized). That means the more processors you have, the better your parallelization concept has to be.</p> <h3>Specialized hardware and software</h3> <p>Most supercomputers are custom-built and use specialized components for hardware and/or software, i.e. you have to learn a lot about new types of architectures if you want to get maximum performance. Typical examples are the network hardware, the file system, or the available compilers (including compiler optimization options). </p> <h3>Parallel file I/O becomes a serious bottleneck</h3> <p>Good parallel file systems handle multiple requests in parallel rather well. However, there is a limit to it, and most file systems do not support the simultaneous access of thousands of processes. Thus reading/writing to a single file internally becomes serialized again, even if you are using parallel I/O concepts such as MPI I/O.</p> <h3>Debugging massively parallel applications is a pain</h3> <p>If you have a problem in your code that only appears when you run it with a certain number of processes, debugging can become very cumbersome, especially if you are not sure where exactly the problem arises. Examples for process number-dependent problems are domain decomposition or the establishment of communication patterns.</p> <h3>Load balancing and communication patterns matter (even more)</h3> <p>This is similar to the first point. Assume that one of your computing nodes takes a little bit longer (e.g. one millisecond) to reach a certain point where all processes have to be synchronized. If you have <code>101</code> nodes, you only waste <code>100</code> * 1 millisecond = <code>0.1 s</code> of computational time. However, if you have 100,001 nodes, you already waste <code>100 s</code>. If this happens repeatedly (e.g. every iteration of a big loop) and if you have a lot of iterations, using more processors soon becomes noneconomical.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload