Note that there are some explanatory texts on larger screens.

plurals
  1. POOpenCL program works on the CPU but outputs empty image when run on the GPU
    primarykey
    data
    text
    <p>I have the following OpenCL Kernel, a Gaussian Blur</p> <pre><code>__constant sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_NEAREST; __constant float gaussian_kernel[3][3] = { {0.0625f, 0.125f, 0.0625f}, {0.125f, 0.25f, 0.125f}, {0.0625f, 0.125f, 0.0625f} }; void kernel gaussian_blur( read_only image2d_t input_image, write_only image2d_t output_image) { int x = get_global_id(0); int y = get_global_id(1); int2 coords[9] = { { x - 1, y - 1 }, { x, y - 1 }, { x + 1, y - 1 }, { x - 1, y }, { x, y }, { x + 1, y }, { x - 1, y + 1 }, { x, y + 1 }, { x + 1, y + 1 } }; float4 pixel_value = { 0.f, 0.f, 0.f, 0.f }; for(int i = 0; i &lt; 3; ++i) { for(int j = 0; j &lt; 3; ++j) { int index = i * 3 + j; float4 blurred = as_float4(read_imageui(input_image, sampler, coords[index])); pixel_value.x += (blurred.x * gaussian_kernel[i][j]); pixel_value.y += (blurred.y * gaussian_kernel[i][j]); pixel_value.z += (blurred.z * gaussian_kernel[i][j]); pixel_value.w += (blurred.w * gaussian_kernel[i][j]); } } uint4 final_value = as_uint4(pixel_value); write_imageui(output_image, coords[4], final_value); } </code></pre> <p>When I specify the device to use as the CPU, The blur works properly. Here's the device selection code</p> <pre><code>std::vector&lt;cl::Platform&gt; all_platforms; cl::Platform::get(&amp;all_platforms); if(all_platforms.size() == 0) { std::cerr &lt;&lt; "No platforms available" &lt;&lt;std::endl; exit(-1); } cl::Platform default_platform = all_platforms[0]; std::vector&lt;cl::Device&gt; all_devices; default_platform.getDevices(CL_DEVICE_TYPE_ALL, &amp;all_devices); if(all_devices.size() == 0) { std::cerr &lt;&lt; "No device found" &lt;&lt; std::endl; exit(-1); } cl::Device default_device = all_devices[1]; //Changing this index to 0 uses my graphics card </code></pre> <p>Now, if the default_device is set to the GPU, only an empty image is output by the program. The relevant image setup code is (note that <code>input</code> is a <code>Magick::Image</code> and <code>in_pixels</code> a heap allocated array of <code>unsigned short</code>):</p> <pre><code>cl::ImageFormat format(CL_RGBA, CL_UNSIGNED_INT16); cl::Image2D input_image_buffer; input.write(0, 0, input.baseColumns(), input.baseRows(), "RGBA", Magick::ShortPixel, in_pixels); input_image_buffer = cl::Image2D( context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, format, input.baseColumns(), input.baseRows(), 0, in_pixels, &amp;cl_error); cl::Image2D output_image_buffer; output_image_buffer = cl::Image2D( context, CL_MEM_WRITE_ONLY | CL_MEM_USE_HOST_PTR, format, input.baseColumns(), input.baseRows(), 0, out_pixels, &amp;cl_error); </code></pre> <p>And the Kernel setup/Image output code (<code>gaussian_program</code> was built without error of course) </p> <pre><code>cl::Kernel gaussian_kernel(gaussian_program, "gaussian_blur"); cl::CommandQueue queue(context, default_device, 0, &amp;cl_error); cl::size_t&lt;3&gt; origin; cl::size_t&lt;3&gt; size; origin[0] = 0; origin[1] = 0; origin[2] = 0; size[0] = input.baseColumns(); size[1] = input.baseRows(); size[2] = 1; cl_error = gaussian_kernel.setArg(0, input_image_buffer); cl_error = gaussian_kernel.setArg(1, output_image_buffer); cl::NDRange range(input.baseColumns(), input.baseRows()); cl_error = queue.enqueueNDRangeKernel( gaussian_kernel, cl::NullRange, range, cl::NullRange); queue.finish(); try{ output.read( input.baseColumns(), input.baseRows(), "RGBA", Magick::ShortPixel, out_pixels); } catch(Magick::Exception&amp; ex) { std::cerr &lt;&lt; "A Magick error occured while writing the pixel cache: " &lt;&lt; std::endl &lt;&lt; ex.what() &lt;&lt; std::endl; return false; } </code></pre> <p>Now, I removed a lot of error checking for the purpose of this example, but the original code checks cl_error after every OpenCL call and never signals an error. The code executes as expected on the CPU but the image is empty when the code is executed on the GPU.</p> <p>I suspected a synchronization issue at first (the <code>queue.finish()</code> call is required for that precise purpose, even on the CPU) but littering the code with <code>cl::finish()</code> or <code>queue.finish()</code> calls in an attempt to serialize the execution didn't help at all.</p> <p>Is there something I'm clearly doing wrong? Is there a potential reason for this OpenCL kernel to fail on the GPU but not on the CPU?</p> <p>For the record, I'm on Ubuntu 13.04 using the AMD APP SDK OpenCL implementation with a Radeon HD 7970.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload