Note that there are some explanatory texts on larger screens.

plurals
  1. POC structures with dynamic data with CUDA kernels?
    primarykey
    data
    text
    <p>Lets say I have a data structure:</p> <pre><code>struct MyBigData { float * dataArray; float * targetArray; float * nodes; float * dataDataData; } </code></pre> <p>I would like to be able to pass this structure around some various CUDA kernels. I don't want to have to pass multiple arrays as arguments, so can I just pass the structure and be done with it? I know the kernels support C structures, but how about dynamic memory in the C structures?</p> <p>It seems I would just do this to make the structure on the CUDA card:</p> <pre><code>MyBigData * mbd = (MyBigData *) cudaMalloc( sizeof(MyBigData) ); </code></pre> <p>But how about the dynamic memory for the arrays in the structure? This line below compiles but has a run-time error:</p> <pre><code>mbd-&gt;dataArray = (float *) cudaMalloc( 10 * sizeof(float) ); </code></pre> <p>This is because cudaMalloc() runs on the CPU, and it cannot read the mdb->dataArray to set the pointer equal to the new memory address. So there's a run-time error. However, this compiles and runs, but doesn't seem to be what I want:</p> <pre><code>MyBigData * mbd = (MyBigData *) malloc( sizeof(myBigData) ); mbd-&gt;dataArray = (float *) cudaMalloc( 10 * sizeof(float) ); </code></pre> <p>Because now, although this is valid, now mbd resides on the main system memory, and the float pointer points to memory allocated on the CUDA device. So I can't just pass a pointer to the MyBigData structure, I have to pass each variable in the structure to the kernel individually. Not clean. What I want is:</p> <pre><code>someKernel&lt;&lt;&lt;1,1&gt;&gt;&gt;(mbd); </code></pre> <p>Not:</p> <pre><code>someKernel&lt;&lt;&lt;1,1&gt;&gt;&gt;(mbd-&gt;dataArray, mbd-&gt;targetArray, mbd-&gt;nodes, mbd-&gt;dataDataData); </code></pre> <p>So I was thinking, how about cudaMemcpy()? I was thinking of this:</p> <pre><code>MyBigData *d_mbd = cudaMemcpy( (void*) &amp;d_mbd, (void*) mbd, SOMESIZE, CudaHostToDevice); </code></pre> <p>But then what do I put for SOMESIZE? I can't use sizeof(MyBigData), because that will include the size of float pointers, not the actual size of the arrays. Second, is cudaMemcpy() even smart enough to dig down into sub-objects of a complicated data structure? I think not.</p> <p>So, is it impossible to have a structure containing dynamic memory on the CUDA card? Or am I missing something. The easy way would be to have a CUDA kernel allocate some memory, but you can't call cudaMalloc() from a CUDA kernel.</p> <p>Thoughts?</p> <p><strong>UPDATE</strong> 7 May: I wrote this code, and it compiles, but it tells me all the values are zero. I think I am creating the object correctly and populating the values properly with the CUDA Kernel. The values are just the thread ID. I suspect I'm not printing the values properly. Thoughts? And thank you!</p> <pre><code>MyBigData* generateData(const int size) { MyBigData *mbd_host, *mbd_cuda; mbd_host = (MyBigData *) malloc( sizeof(MyBigData) ); cudaMalloc( (void**) &amp;mbd_host-&gt;dataArray, size * sizeof(float) ); cudaMalloc( (void**) &amp;mbd_host-&gt;targetArray, size * sizeof(float) ); cudaMalloc( (void**) &amp;mbd_host-&gt;nodes, size * sizeof(float) ); cudaMalloc( (void**) &amp;mbd_host-&gt;dataDataData, size * sizeof(float) ); cudaMalloc( (void**) &amp;mbd_cuda, sizeof(MyBigData) ); cudaMemcpy( mbd_cuda, mbd_host, sizeof(mbd_host), cudaMemcpyHostToDevice ); free(mbd_host); return mbd_cuda; } void printCudaData(MyBigData* mbd_cuda, const int size) { MyBigData *mbd; cudaMemcpy( mbd, mbd_cuda, sizeof(mbd_cuda), cudaMemcpyDeviceToHost); MyBigData *mbd_host = (MyBigData *) malloc( sizeof(MyBigData)); mbd_host-&gt;dataArray = (float*) malloc(size * sizeof(float)); mbd_host-&gt;targetArray = (float*) malloc(size * sizeof(float)); mbd_host-&gt;nodes = (float*) malloc(size * sizeof(float)); mbd_host-&gt;dataDataData = (float*) malloc(size * sizeof(float)); cudaMemcpy( mbd_host-&gt;dataArray, mbd-&gt;dataArray, size * sizeof(float), cudaMemcpyDeviceToHost); cudaMemcpy( mbd_host-&gt;targetArray, mbd-&gt;targetArray, size * sizeof(float), cudaMemcpyDeviceToHost); cudaMemcpy( mbd_host-&gt;nodes, mbd-&gt;nodes, size * sizeof(float), cudaMemcpyDeviceToHost); cudaMemcpy( mbd_host-&gt;dataDataData, mbd-&gt;dataDataData, size * sizeof(float), cudaMemcpyDeviceToHost); for(int i = 0; i &lt; size; i++) { printf("data[%i] = %f\n", i, mbd_host-&gt;dataArray[i]); printf("target[%i] = %f\n", i, mbd_host-&gt;targetArray[i]); printf("nodes[%i] = %f\n", i, mbd_host-&gt;nodes[i]); printf("data2[%i] = %f\n", i, mbd_host-&gt;dataDataData[i]); } free(mbd_host-&gt;dataArray); free(mbd_host-&gt;targetArray); free(mbd_host-&gt;nodes); free(mbd_host-&gt;dataDataData); free(mbd_host); } </code></pre> <p>This is my Kernel and the function that calls it:</p> <pre><code>__global__ void cudaInitData(MyBigData* mbd) { const int threadID = threadIdx.x; mbd-&gt;dataArray[threadID] = threadID; mbd-&gt;targetArray[threadID] = threadID; mbd-&gt;nodes[threadID] = threadID; mbd-&gt;dataDataData[threadID] = threadID; } void initData(MyBigData* mbd, const int size) { if (mbd == NULL) mbd = generateData(size); cudaInitData&lt;&lt;&lt;size,1&gt;&gt;&gt;(mbd); } </code></pre> <p>My <code>main()</code> calls:</p> <pre><code>MyBigData* mbd = NULL; initData(mbd, 10); printCudaData(mbd, 10); </code></pre>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload