StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POIncorrect results for CUDA Matrix Multiplication
primarykey
Id
16157754
data
AcceptedAnswerId
16157888
AnswerCount
2
ClosedDate
CommentCount
0
CommunityOwnedDate
CreationDate
2013-04-22T22:23:15.947
FavoriteCount
0
LastActivityDate
2013-04-25T20:10:49.320
LastEditDate
LastEditorUserId
0
OwnerUserId
1778852
ParentId
0
PostTypeId
1
Score
0
ViewCount
512
LastEditorDisplayName
text
Body
Let me start off by apologizing for this post. I know there have been several posts asking the same question as I will here, but I've tried the solutions that were given and I'm still not getting correct results for CUDA matrix multiplication. From examples I've followed, I'm pretty sure my algorithm within the kernel is correct. I don't believe I'm have any trouble passing the 2D arrays to the kernel, and as they're passed by reference, I feel like the 2D solution array should contain the correct answers by the time the array is printed in the host, but it doesn't. Could it be an issue with my dim3 dimGrid(B, B) and dim3 dimThreads(T, T) variables? I'm new to the CUDA framework and am still trying to wrap my head around it. Any suggestions would be very greatly appreciated. My code is as follows: <pre><code>#include <stdio.h> #include <cuda.h> #include <stdlib.h> __global__ void MatMultiply (int *a, int *b, int *c, int N) { int row = blockIdx.y * blockDim.y + threadIdx.y; int col = blockIdx.x * blockDim.x + threadIdx.x; int val = 0; for (int e = 0; e < N; ++e) { val += a[row*N + e] * b[e*N + col]; } c[row*N+col] = val; } int main(void) { int N, B, T; printf("Input integer for matrix dimension size: "); scanf("%d", &N); printf("Input number of threads in a block: "); scanf("%d", &T); printf("Input number of blocks in a grid: "); scanf("%d", &B); int size = N * N * sizeof(int); int *a, *b, *c; a = (int*)malloc(size); b = (int*)malloc(size); c = (int*)malloc(size); for (int i = 0; i < N; i++) { for (int j = 0; j < N; j++) { a[i*N+j] = j + i*N; b[i*N+j] = j + i*N; c[i*N+j] = j + i*N; } } int *dev_a, *dev_b, *dev_c; cudaMalloc((void**)&dev_a, size); cudaMalloc((void**)&dev_b, size); cudaMalloc((void**)&dev_c, size); cudaMemcpy(dev_a, a, size, cudaMemcpyHostToDevice); cudaMemcpy(dev_b, b, size, cudaMemcpyHostToDevice); cudaMemcpy(dev_c, c, size, cudaMemcpyHostToDevice); dim3 dimGrid(B, B); dim3 dimThreads(T, T); MatMultiply<<<B, T>>>(dev_a,dev_b,dev_c, N); cudaMemcpy(c, dev_c, size, cudaMemcpyDeviceToHost); for (int i = 0; i < N; i++) { for (int j = 0; j < N; j++) { printf("%d\t", b[i*N + j]); } printf("\n"); } free(a); free(b); free(c); cudaFree(dev_a); cudaFree(dev_b); cudaFree(dev_c); return 0; } </code></pre> Thanks again.
Tags
<matrix><cuda><multiplication>
Title
Incorrect results for CUDA Matrix Multiplication
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. This table or related slice is empty.
UserOwnerUserId
1. USChris
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
2. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. This table or related slice is empty.
CommentsPostId
1. This table or related slice is empty.

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.