Note that there are some explanatory texts on larger screens.

plurals
  1. PO2D Image Indexing Bug in CUDA Kernel
    primarykey
    data
    text
    <p>I'm doing linear filtering on images using CUDA. I use 2D thread blocks and 2D grid to make the problem natural. Here's how I index: (<strong>height</strong> and <strong>width</strong> are image dimensions)</p> <pre><code>dim3 BlockDim(16,16); dim3 GridDim; GridDim.x = (width + 15) / 16; GridDim.y = (height + 15) / 16; </code></pre> <p>In kernel I access the locations as follows:</p> <pre><code>unsigned int xIndex = blockIdx.x*16+ threadIdx.x; unsigned int yIndex = blockIdx.y*16+ threadIdx.y; unsigned int tid = yIndex * width + xIndex; </code></pre> <p>And I want to return four boundaries (i'll cater them later on). I do this as:</p> <pre><code>if(yIndex&gt;=height-N || xIndex&gt;=width-N || yIndex&lt;N || xIndex&lt;N) return; </code></pre> <p>Where N is the number of pixels at each boundary I don't want to calculate.</p> <p><strong>Problem:</strong></p> <p>The code runs fine on all standard images sizes. But for some random image sizes it shows diagonal line(s). For example in my case 500x333 image (even when no dimension is multiple of 16) is showing correct output whereas 450x365 is showing diagonal lines in the output. The problem remains even if I just return the extra threads of grid and nothing else like this:</p> <pre><code>if(yIndex&gt;=height || xIndex&gt;=width) return; </code></pre> <p>The code remains the same, some inputs run fine while others don't. Can anybody spot the bug? I have attached the input and output samples here: <a href="http://www.hardwareinsight.com/IMAGES.zip" rel="nofollow">IMAGES</a> Thanks!</p> <p><strong>Update:</strong></p> <p>Kernel Code (Simplified to return input image, but gives the same problem)</p> <pre><code>__global__ void filter_8u_c1_kernel(unsigned char* in, unsigned char* out, int width, int height, float* filter, int fSize) { unsigned int xIndex = blockIdx.x*BLOCK_SIZE + threadIdx.x; unsigned int yIndex = blockIdx.y*BLOCK_SIZE + threadIdx.y; unsigned int tid = yIndex * width + xIndex; unsigned int N = filterSize/2; if(yIndex&gt;=height-N || xIndex&gt;=width-N || yIndex&lt;N || xIndex&lt;N) return; /*Filter code removed, still gives the same problem*/ out[tid] = in[tid]; } </code></pre> <p><strong>Update 2:</strong></p> <p>I have also removed the <strong>return</strong> statement by reversing the <strong>if</strong> condition. But the problem persists.</p> <pre><code>if(yIndex&lt;=height-N &amp;&amp; xIndex&lt;=width-N &amp;&amp; yIndex&gt;N &amp;&amp; xIndex&gt;N){ /*Kernel Code*/ } </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload