Note that there are some explanatory texts on larger screens.

plurals
  1. POWhy is this simple OpenCL kernel running so slowly?
    primarykey
    data
    text
    <p>I'm looking into OpenCL, and I'm a little confused why this kernel is running so slowly, compared to how I would expect it to run. Here's the kernel: </p> <pre><code>__kernel void copy( const __global char* pSrc, __global __write_only char* pDst, int length) { const int tid = get_global_id(0); if(tid &lt; length) { pDst[tid] = pSrc[tid]; } } </code></pre> <p>I've created the buffers in the following way:</p> <pre><code>char* out = new char[2048*2048]; cl::Buffer( context, CL_MEM_USE_HOST_PTR | CL_MEM_WRITE_ONLY, length, out); </code></pre> <p>Ditto for the input buffer, except that I've initialized the in pointer to random values. Finally, I run the kernel this way: </p> <pre><code>cl::Event event; queue.enqueueNDRangeKernel( kernel, cl::NullRange, cl::NDRange(length), cl::NDRange(1), NULL, &amp;event); event.wait(); </code></pre> <p>On average, the time is around 75 milliseconds, as calculated by: </p> <pre><code>cl_ulong startTime = event.getProfilingInfo&lt;CL_PROFILING_COMMAND_START&gt;(); cl_ulong endTime = event.getProfilingInfo&lt;CL_PROFILING_COMMAND_END&gt;(); std::cout &lt;&lt; (endTime - startTime) * SECONDS_PER_NANO / SECONDS_PER_MILLI &lt;&lt; "\n"; </code></pre> <p>I'm running Windows 7, with an Intel i5-3450 chip (Sandy Bridge architecture). For comparison, the "direct" way of doing the copy takes less than 5 milliseconds. I don't think the event.getProfilingInfo includes the communication time between the host and device. Thoughts? </p> <p>EDIT: </p> <p>At the suggestion of ananthonline, I changed the kernel to use float4s instead of chars, and that dropped the average run time to about 50 millis. Still not as fast as I would have hoped, but an improvement. Thanks ananthonline!</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload