Note that there are some explanatory texts on larger screens.

plurals
  1. POOpenCL Matrix Multiplication - Getting wrong answer
    text
    copied!<p>here's a simple OpenCL Matrix Multiplication kernel which is driving me crazy:</p> <p>By the way I am using pyopencl.</p> <pre><code>__kernel void matrixMul( __global int* C, __global int* A, __global int* B, int wA, int wB){ int row = get_global_id(1); //2D Threas ID x int col = get_global_id(0); //2D Threas ID y //Perform dot-product accumulated into value int value = 0; for ( int k = 0; k &lt; wA; k++ ){ value += A[row*wA + k] * B[k*wB+col]; } C[row*wA+col] = value; //Write to the device memory } </code></pre> <p>Where (inputs)</p> <pre><code>A = [72 45 75 61] B = [26 53 46 76] wA = wB = 2 </code></pre> <p>Output I am getting:</p> <p>Sometime I get:</p> <pre><code>C = [3942 0 0 5472] </code></pre> <p>Else I get:</p> <pre><code>C = [3942 7236 3312 5472] </code></pre> <p>But the output should be:</p> <pre><code>C = [3942 7236 4756 8611] </code></pre> <p>I don't know what mistake I am making here. I have spent the entire day with no luck.</p> <p>Please help me with this</p> <p>Here's the full python code:</p> <pre><code>import pyopencl as cl import numpy as np import os ORDER = 2 LEN = ORDER*ORDER ctx = cl.create_some_context() commandQueue = cl.CommandQueue( ctx ) A = np.array((72, 45, 75, 61), dtype = np.int32) B = np.array((26, 53, 46, 76), dtype = np.int32) C = np.empty_like(A) in_buf1 = cl.Buffer( ctx, cl.mem_flags.READ_ONLY | cl.mem_flags.COPY_HOST_PTR, hostbuf = A ) in_buf2 = cl.Buffer( ctx, cl.mem_flags.READ_ONLY | cl.mem_flags.COPY_HOST_PTR, hostbuf = B ) out_buf = cl.Buffer( ctx, cl.mem_flags.WRITE_ONLY, C.nbytes ) kernelSrc1 = """__kernel void matrixMul( /*const int Mdim, const int Ndim, const int Pdim,*/ __global int* C, __global int* A, __global int* B, int wA, int wB) { int row = get_global_id(1); //2D Threas ID x int col = get_global_id(0); //2D Threas ID y //Perform dot-product accumulated into value int value = 0; for ( int k = 0; k &lt; wA; k++ ){ value += A[row*wA + k] * B[k*wB+col]; } C[row*wA+col] = value; //Write to the device memory }""" program1 = cl.Program(ctx, kernelSrc1 ).build() event1 = program1.matrixMul( commandQueue, (LEN, ), None, out_buf, in_buf1, in_buf2, np.int32(ORDER), np.int32(ORDER)); event1.wait() cl.enqueue_copy(commandQueue, C, out_buf) print C </code></pre> <p>I am using Python 2.7.x, pyopencl 2012.1, AMD APP SDK</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload