Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>Before anything else, <a href="https://stackoverflow.com/questions/14791801/how-do-user-level-threads-ults-and-kernel-level-threads-klts-differ-with-reg/14792010#14792010">templatetypedef</a>'s answer is beautiful; I simply wanted to extend his response a little.</p> <p>There is one area which I felt the need for expanding a little: <strong>combinations of ULT's and KLT's</strong>. To understand the importance (what Wikipedia labels <a href="http://en.wikipedia.org/wiki/Thread_%28computer_science%29#M:N_.28Hybrid_threading.29" rel="nofollow noreferrer">hybrid threading</a>), consider the following examples:</p> <p>Consider a multi-threaded program (multiple KLT's) where there are more KLT's than available logical cores. In order to efficiently use every core, as you mentioned, you want the scheduler to switch out KLT's that are blocking with ones that in a ready state and not blocking. This ensures the core is reducing its amount of idle time. Unfortunately, switching KLT's is expensive for the scheduler and it consumes a relatively large amount of CPU time.</p> <p>This is one area where hybrid threading can be helpful. Consider a multi-threaded program with multiple KLT's and ULT's. Just as <strong>templatetypedef</strong> noted, only one ULT can be running at one time for each KLT. If a ULT is blocking, we still want to switch it out for one which is not blocking. Fortunately, ULT's are much more lightweight than KLT's, in the sense that there less resources assigned to a ULT and they require no interaction with the kernel scheduler. Essentially, it is almost always quicker to switch out ULT's than it is to switch out KLT's. As a result, we are able to significantly reduce a cores idle time relative to the first example.</p> <p>Now, of course, all of this depends on the threading library being used for implementing ULT's. There are two ways (which I can come up with) for "mapping" ULT's to KLT's.</p> <ol> <li><p>A collection of ULT's <strong>for all</strong> KLT's</p> <p>This situation is ideal on a shared memory system. There is essentially a "pool" of ULT's to which each KLT has access. Ideally, the threading library scheduler would assign ULT's to each KLT upon request as opposed to the KLT's accessing the pool individually. The later could cause race conditions or deadlocks if not implemented with locks or something similar.</p></li> <li><p>A collection of ULT's <strong>for each</strong> KLT (<a href="http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&amp;arnumber=4536359" rel="nofollow noreferrer">Qthreads</a>)</p> <p>This situation is ideal on a distributed memory system. Each KLT would have a collection of ULT's to run. The draw back is that the user (or the threading library) would have to divide the ULT's between the KLT's. This could result in load imbalance since it is not guaranteed that all ULT's will have the same amount of work to complete and complete roughly the same amount of time. The solution to this is allowing for ULT migration; that is, migrating ULT's between KLT's.</p></li> </ol>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload