StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POC structures with dynamic data with CUDA kernels?
primarykey
Id
10456330
data
AcceptedAnswerId
0
AnswerCount
1
ClosedDate
CommentCount
1
CommunityOwnedDate
CreationDate
2012-05-04T21:24:09.690
FavoriteCount
0
LastActivityDate
2013-03-05T11:42:12.540
LastEditDate
2012-05-07T15:05:28.177
LastEditorUserId
907773
OwnerUserId
907773
ParentId
0
PostTypeId
1
Score
0
ViewCount
2401
LastEditorDisplayName
text
Body
Lets say I have a data structure: <pre><code>struct MyBigData { float * dataArray; float * targetArray; float * nodes; float * dataDataData; } </code></pre> I would like to be able to pass this structure around some various CUDA kernels. I don't want to have to pass multiple arrays as arguments, so can I just pass the structure and be done with it? I know the kernels support C structures, but how about dynamic memory in the C structures? It seems I would just do this to make the structure on the CUDA card: <pre><code>MyBigData * mbd = (MyBigData *) cudaMalloc( sizeof(MyBigData) ); </code></pre> But how about the dynamic memory for the arrays in the structure? This line below compiles but has a run-time error: <pre><code>mbd->dataArray = (float *) cudaMalloc( 10 * sizeof(float) ); </code></pre> This is because cudaMalloc() runs on the CPU, and it cannot read the mdb->dataArray to set the pointer equal to the new memory address. So there's a run-time error. However, this compiles and runs, but doesn't seem to be what I want: <pre><code>MyBigData * mbd = (MyBigData *) malloc( sizeof(myBigData) ); mbd->dataArray = (float *) cudaMalloc( 10 * sizeof(float) ); </code></pre> Because now, although this is valid, now mbd resides on the main system memory, and the float pointer points to memory allocated on the CUDA device. So I can't just pass a pointer to the MyBigData structure, I have to pass each variable in the structure to the kernel individually. Not clean. What I want is: <pre><code>someKernel<<<1,1>>>(mbd); </code></pre> Not: <pre><code>someKernel<<<1,1>>>(mbd->dataArray, mbd->targetArray, mbd->nodes, mbd->dataDataData); </code></pre> So I was thinking, how about cudaMemcpy()? I was thinking of this: <pre><code>MyBigData *d_mbd = cudaMemcpy( (void*) &d_mbd, (void*) mbd, SOMESIZE, CudaHostToDevice); </code></pre> But then what do I put for SOMESIZE? I can't use sizeof(MyBigData), because that will include the size of float pointers, not the actual size of the arrays. Second, is cudaMemcpy() even smart enough to dig down into sub-objects of a complicated data structure? I think not. So, is it impossible to have a structure containing dynamic memory on the CUDA card? Or am I missing something. The easy way would be to have a CUDA kernel allocate some memory, but you can't call cudaMalloc() from a CUDA kernel. Thoughts? UPDATE 7 May: I wrote this code, and it compiles, but it tells me all the values are zero. I think I am creating the object correctly and populating the values properly with the CUDA Kernel. The values are just the thread ID. I suspect I'm not printing the values properly. Thoughts? And thank you! <pre><code>MyBigData* generateData(const int size) { MyBigData *mbd_host, *mbd_cuda; mbd_host = (MyBigData *) malloc( sizeof(MyBigData) ); cudaMalloc( (void**) &mbd_host->dataArray, size * sizeof(float) ); cudaMalloc( (void**) &mbd_host->targetArray, size * sizeof(float) ); cudaMalloc( (void**) &mbd_host->nodes, size * sizeof(float) ); cudaMalloc( (void**) &mbd_host->dataDataData, size * sizeof(float) ); cudaMalloc( (void**) &mbd_cuda, sizeof(MyBigData) ); cudaMemcpy( mbd_cuda, mbd_host, sizeof(mbd_host), cudaMemcpyHostToDevice ); free(mbd_host); return mbd_cuda; } void printCudaData(MyBigData* mbd_cuda, const int size) { MyBigData *mbd; cudaMemcpy( mbd, mbd_cuda, sizeof(mbd_cuda), cudaMemcpyDeviceToHost); MyBigData *mbd_host = (MyBigData *) malloc( sizeof(MyBigData)); mbd_host->dataArray = (float*) malloc(size * sizeof(float)); mbd_host->targetArray = (float*) malloc(size * sizeof(float)); mbd_host->nodes = (float*) malloc(size * sizeof(float)); mbd_host->dataDataData = (float*) malloc(size * sizeof(float)); cudaMemcpy( mbd_host->dataArray, mbd->dataArray, size * sizeof(float), cudaMemcpyDeviceToHost); cudaMemcpy( mbd_host->targetArray, mbd->targetArray, size * sizeof(float), cudaMemcpyDeviceToHost); cudaMemcpy( mbd_host->nodes, mbd->nodes, size * sizeof(float), cudaMemcpyDeviceToHost); cudaMemcpy( mbd_host->dataDataData, mbd->dataDataData, size * sizeof(float), cudaMemcpyDeviceToHost); for(int i = 0; i < size; i++) { printf("data[%i] = %f\n", i, mbd_host->dataArray[i]); printf("target[%i] = %f\n", i, mbd_host->targetArray[i]); printf("nodes[%i] = %f\n", i, mbd_host->nodes[i]); printf("data2[%i] = %f\n", i, mbd_host->dataDataData[i]); } free(mbd_host->dataArray); free(mbd_host->targetArray); free(mbd_host->nodes); free(mbd_host->dataDataData); free(mbd_host); } </code></pre> This is my Kernel and the function that calls it: <pre><code>__global__ void cudaInitData(MyBigData* mbd) { const int threadID = threadIdx.x; mbd->dataArray[threadID] = threadID; mbd->targetArray[threadID] = threadID; mbd->nodes[threadID] = threadID; mbd->dataDataData[threadID] = threadID; } void initData(MyBigData* mbd, const int size) { if (mbd == NULL) mbd = generateData(size); cudaInitData<<<size,1>>>(mbd); } </code></pre> My <code>main()</code> calls: <pre><code>MyBigData* mbd = NULL; initData(mbd, 10); printCudaData(mbd, 10); </code></pre>
Tags
<c><cuda><structure><dynamic-memory-allocation>
Title
C structures with dynamic data with CUDA kernels?
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USRichard Żak
UserOwnerUserId
1. USRichard Żak
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. This table or related slice is empty.
CommentsPostId
1. COI am not a CUDA developer, but it sounds like what you're describing would very much not be possible the way you've described- when you're sharing pointers between two discreet memory blocks, things are just not going to work. The memcopy family of functions want a continuous block of data, which you don't have. What I am curious about is the constant 10- if your arrays are always length 10, why not build your data structure to be 4 * ((sizeof(float*) + (10 * sizeof(float)))?
 singulars
 PostPostId
 POC structures with dynamic data with CUDA kernels?
 UserUserId
 USDavid Souther

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.