Note that there are some explanatory texts on larger screens.

plurals
  1. POIncorrect results for CUDA Matrix Multiplication
    primarykey
    data
    text
    <p>Let me start off by apologizing for this post. I know there have been several posts asking the same question as I will here, but I've tried the solutions that were given and I'm still not getting correct results for CUDA matrix multiplication.</p> <p>From examples I've followed, I'm pretty sure my algorithm within the kernel is correct. I don't believe I'm have any trouble passing the 2D arrays to the kernel, and as they're passed by reference, I feel like the 2D solution array should contain the correct answers by the time the array is printed in the host, but it doesn't.</p> <p>Could it be an issue with my dim3 dimGrid(B, B) and dim3 dimThreads(T, T) variables? I'm new to the CUDA framework and am still trying to wrap my head around it. Any suggestions would be very greatly appreciated. My code is as follows:</p> <pre><code>#include &lt;stdio.h&gt; #include &lt;cuda.h&gt; #include &lt;stdlib.h&gt; __global__ void MatMultiply (int *a, int *b, int *c, int N) { int row = blockIdx.y * blockDim.y + threadIdx.y; int col = blockIdx.x * blockDim.x + threadIdx.x; int val = 0; for (int e = 0; e &lt; N; ++e) { val += a[row*N + e] * b[e*N + col]; } c[row*N+col] = val; } int main(void) { int N, B, T; printf("Input integer for matrix dimension size: "); scanf("%d", &amp;N); printf("Input number of threads in a block: "); scanf("%d", &amp;T); printf("Input number of blocks in a grid: "); scanf("%d", &amp;B); int size = N * N * sizeof(int); int *a, *b, *c; a = (int*)malloc(size); b = (int*)malloc(size); c = (int*)malloc(size); for (int i = 0; i &lt; N; i++) { for (int j = 0; j &lt; N; j++) { a[i*N+j] = j + i*N; b[i*N+j] = j + i*N; c[i*N+j] = j + i*N; } } int *dev_a, *dev_b, *dev_c; cudaMalloc((void**)&amp;dev_a, size); cudaMalloc((void**)&amp;dev_b, size); cudaMalloc((void**)&amp;dev_c, size); cudaMemcpy(dev_a, a, size, cudaMemcpyHostToDevice); cudaMemcpy(dev_b, b, size, cudaMemcpyHostToDevice); cudaMemcpy(dev_c, c, size, cudaMemcpyHostToDevice); dim3 dimGrid(B, B); dim3 dimThreads(T, T); MatMultiply&lt;&lt;&lt;B, T&gt;&gt;&gt;(dev_a,dev_b,dev_c, N); cudaMemcpy(c, dev_c, size, cudaMemcpyDeviceToHost); for (int i = 0; i &lt; N; i++) { for (int j = 0; j &lt; N; j++) { printf("%d\t", b[i*N + j]); } printf("\n"); } free(a); free(b); free(c); cudaFree(dev_a); cudaFree(dev_b); cudaFree(dev_c); return 0; } </code></pre> <p>Thanks again.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload