Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>There are quite a few things you still haven't described very well, but based on the information you have posted, I built what I am guessing is a reasonable repro case with parameters which match a case you say it failing (450 x 364 with <code>filterSize=5</code>):</p> <pre><code>#include &lt;stdio.h&gt; #include &lt;assert.h&gt; template&lt;int filterSize&gt; __global__ void filter_8u_c1_kernel(unsigned char* in, unsigned char* out, int width, int height, float* filter, int fSize) { unsigned int xIndex = blockIdx.x*blockDim.x + threadIdx.x; unsigned int yIndex = blockIdx.y*blockDim.y + threadIdx.y; unsigned int tid = yIndex * width + xIndex; unsigned int N = filterSize/2; if(yIndex&gt;=height-N || xIndex&gt;=width-N || yIndex&lt;N || xIndex&lt;N) return; out[tid] = in[tid]; } int main(void) { const int width = 450, height = 365, filterSize=5; const size_t isize = sizeof(unsigned char) * size_t(width * height); unsigned char * _in, * _out, * out; assert( cudaMalloc((void **)&amp;_in, isize) == cudaSuccess ); assert( cudaMalloc((void **)&amp;_out, isize) == cudaSuccess ); assert( cudaMemset(_in, 'Z', isize) == cudaSuccess ); assert( cudaMemset(_out, 'A', isize) == cudaSuccess ); const dim3 BlockDim(16,16); dim3 GridDim; GridDim.x = (width + BlockDim.x - 1) / BlockDim.x; GridDim.y = (height + BlockDim.y - 1) / BlockDim.y; filter_8u_c1_kernel&lt;filterSize&gt;&lt;&lt;&lt;GridDim,BlockDim&gt;&gt;&gt;(_in,_out,width,height,0,0); assert( cudaPeekAtLastError() == cudaSuccess ); out = (unsigned char *)malloc(isize); assert( cudaMemcpy(out, _out, isize, cudaMemcpyDeviceToHost) == cudaSuccess); for(int i=0; i&lt;width; i++) { fprintf(stdout, "%d: ", i); for(int j=0; j&lt;height; j++) { unsigned int idx = i + j*width; fprintf(stdout, "%c", out[idx]); } fprintf(stdout, "\n"); } return cudaThreadExit(); } </code></pre> <p>When run it does exactly what I would expect, overwriting the output memory with the input everywhere except for the first and last two lines and the first and last two entries in all the lines in between. This is running with CUDA 3.2 on OS X 10.6.5 with a compute 1.2 GPU. So whatever is happening in you code, it isn't happening in my repro case, which either means I have misinterpreted what you have written, or there is something else you haven't described which is causing the problem.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload