Note that there are some explanatory texts on larger screens.

plurals
  1. POWhat's the best way of encapsulating CUDA kernels?
    text
    copied!<p>I'm trying to make a CUDA project getting the closest to an OO design as possible. In the moment, the solution that I found is by using a Struct to encapsulate the data and for each method that needs some GPU processing, the implementation of 3 functions are necessary:</p> <ol> <li>The method that will be called by the object. </li> <li>A <strong>__<em>global</em>__</strong> function that will call a <strong>__<em>device</em>__</strong> method of that struct.</li> <li>A <strong>__<em>device</em>__</strong> method inside the struct.</li> </ol> <p>I will give you an example. Lets say I need to implement a method to initialize a buffer inside a struct. It would looks like something like that:</p> <pre><code>struct Foo { float *buffer; short2 buffer_resolution_; short2 block_size_; __device__ initBuffer() { int x = blockIdx.x * blockDim.x + threadIdx.x; int y = blockIdx.y * blockDim.y + threadIdx.y; int plain_index = (y * buffer_resolution.x) + x; if(plain_index &lt; buffer_size) buffer[plain_index] = 0; } void init(const short2 &amp;buffer_resolution, const short2 &amp;block_size) { buffer_resolution_ = buffer_resolution; block_size_ = block_size; //EDIT1 - Added the cudaMalloc cudaMalloc((void **)&amp;buffer_, buffer_resolution.x * buffer_resolution.y); dim3 threadsPerBlock(block_size.x, block_size.y); dim3 blocksPerGrid(buffer_resolution.x/threadsPerBlock.x, buffer_resolution.y/threadsPerBlock.y) initFooKernel&lt;&lt;&lt;blocksPerGrid, threadsPerBlock&gt;&gt;&gt;(this); } } __global__ initFooKernel(Foo *foo) { foo-&gt;initBuffer(); } </code></pre> <p>I need to do that because looks like that I cant declare a <strong>__<em>global</em>__</strong> inside the struct. I've learned this way by looking at some opensource projects, but looks a lot troublesome to implement THREE functions to implement every encapsulated GPU method. So, my question is: Is that the best/only approach possible? Is that even a VALID aproach?</p> <p>EDIT1: I forgot to put the cudaMalloc to allocate the buffer before calling initFooKernel. Fixed it.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload