Note that there are some explanatory texts on larger screens.

plurals
  1. POOpenCL - Multiple GPU Buffer Synchronization
    primarykey
    data
    text
    <p>I have an OpenCL kernel that calculates total force on a particle exerted by other particles in the system, and then another one that integrates the particle position/velocity. I would like to parallelize these kernels across multiple GPUs, basically assigning some amount of particles to each GPU. However, I have to run this kernel multiple times, and the result from each GPU is used on every other. Let me explain that a little further:</p> <p>Say you have particle 0 on GPU 0, and particle 1 on GPU 1. The force on particle 0 is changed, as is the force on particle 1, and then their positions and velocities are changed accordingly by the integrator. Then, these new positions need to be placed on each GPU (both GPUs need to know where both particle 0 and particle 1 are) and these new positions are used to calculate the forces on each particle in the next step, which is used by the integrator, whose results are used to calculate forces, etc, etc. <strong>Essentially, all the buffers need to contain the same information by the time the force calculations roll around.</strong></p> <p>So, the question is: <strong>What is the best way to synchronize buffers across GPUs, given that each GPU has a different buffer</strong>? They cannot have a single shared buffer if I want to keep parallelism, <a href="https://stackoverflow.com/questions/11562543/clenqueuendrange-blocking-on-nvidia-hardware-also-multi-gpu">as per my last question</a> (though, if there is a way to create a shared buffer and still keep multiple GPUs, I'm all for that). I suspect that copying the results each step will cause more slowdown than it's worth to parallelize the algorithm across GPUs.</p> <p>I did find <a href="https://stackoverflow.com/questions/11093826/read-write-opencl-memory-buffers-on-multiple-gpu-in-a-single-context">this thread</a>, but the answer was not very definitive and applied only to a single buffer across all GPUs. I would like to know, specifically, for Nvidia GPUs (more specifically, the Tesla M2090).</p> <p><strong>EDIT:</strong> Actually, as per <a href="http://www.khronos.org/message_boards/viewtopic.php?f=37&amp;t=2133" rel="nofollow noreferrer">this thread on the Khronos forums</a>, a representative from the OpenCL working group says that a single buffer on a shared context does indeed get spread across multiple GPUs, with each one making sure that it has the latest info in memory. However, I'm not seeing that behavior on Nvidia GPUs; when I use <code>watch -n .5 nvidia-smi</code> while my program is running in the background, I see one GPU's memory usage go up for a while, and then go down while another GPU's memory usage goes up. Is there anyone out there that can point me in the right direction with this? Maybe it's just their implementation?</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload