Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>You can find exactly identical API in OpenCL.</p> <p>How it works on CUDA:</p> <p>According to <a href="http://coitweb.uncc.edu/~abw/ITCS6010S11/ZeroCopyMemory.pptx" rel="nofollow">this presentation</a> and the <a href="http://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY.html#group__CUDART__MEMORY_1ga475419a9b21a66036029d5001ea908c" rel="nofollow">official documentation</a>.</p> <p>The money quote about <code>cudaHostGetDevicePointer</code> : </p> <blockquote> <p>Passes back device pointer of mapped host memory allocated by cudaHostAlloc or registered by cudaHostRegister.</p> </blockquote> <p>CUDA <code>cudaHostAlloc</code> with <code>cudaHostGetDevicePointer</code> works exactly like <code>CL_MEM_ALLOC_HOST_PTR</code> with <code>MapBuffer</code> works in OpenCL. Basically if it's a discrete GPU the results are cached in the device and if it's a discrete GPU with shared memory with the host it will use the memory directly. So there is no actual 'zero copy' operation with discrete GPU in CUDA.</p> <p>The function <code>cudaHostGetDevicePointer</code> does not take raw malloced pointers in, just like what is the limitation in OpenCL. From the API users point of view those two are exactly identical approaches allowing the implementation to do pretty much identical optimizations.</p> <p>With discrete GPU the pointer you get points to an area where the GPU can directly transfer stuff in via DMA. Otherwise the driver would take your pointer, copy the data to the DMA area and then initiate the transfer.</p> <p>However in OpenCL2.0 that is explicitly possible, depending on the capabilities of your devices. With the finest granularity sharing you can use randomly malloced host pointers and even use atomics with the host, so you could even dynamically control the kernel from the host while it is running.</p> <p><a href="http://www.khronos.org/registry/cl/specs/opencl-2.0.pdf" rel="nofollow">http://www.khronos.org/registry/cl/specs/opencl-2.0.pdf</a></p> <p>See page 162 for the shared virtual memory spec. Do note that when you write kernels even these are still just __global pointers from the kernel point of view.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload