Note that there are some explanatory texts on larger screens.

plurals
  1. POCUDA kernel launch fails when using various offsets into input data
    primarykey
    data
    text
    <p>My code is giving an error message and I am trying to track down the cause of it. To make it easier to find the problem, I have stripped away code that apparently is not relevant to causing the error message. If you can tell me why the following simple code produces an error message, then I think I should be able to fix my original code:</p> <pre><code>#include "cuComplex.h" #include &lt;cutil.h&gt; __device__ void compute_energy(void *data, int isample, int nsamples) { cuDoubleComplex * const nminusarray = (cuDoubleComplex*)data; cuDoubleComplex * const f = (cuDoubleComplex*)(nminusarray+101); double * const abs_est_errorrow_all = (double*)(f+3); double * const rel_est_errorrow_all = (double*)(abs_est_errorrow_all+nsamples*51); int * const iid_all = (int*)(rel_est_errorrow_all+nsamples*51); int * const iiu_all = (int*)(iid_all+nsamples*21); int * const piv_all = (int*)(iiu_all+nsamples*21); cuDoubleComplex * const energyrow_all = (cuDoubleComplex*)(piv_all+nsamples*12); cuDoubleComplex * const refinedenergyrow_all = (cuDoubleComplex*)(energyrow_all+nsamples*51); cuDoubleComplex * const btplus_all = (cuDoubleComplex*)(refinedenergyrow_all+nsamples*51); cuDoubleComplex * const btplus = btplus_all+isample*21021; btplus[0] = make_cuDoubleComplex(0.0, 0.0); } __global__ void computeLamHeight(void *data, int nlambda) { compute_energy(data, blockIdx.x, nlambda); } int main(int argc, char *argv[]) { void *device_data; CUT_DEVICE_INIT(argc, argv); CUDA_SAFE_CALL(cudaMalloc(&amp;device_data, 184465640)); computeLamHeight&lt;&lt;&lt;dim3(101, 1, 1), dim3(512, 1, 1), 45000&gt;&gt;&gt;(device_data, 101); CUDA_SAFE_CALL(cudaThreadSynchronize()); } </code></pre> <p>I am using a GeForce GTX 480 and I am compiling the code like so:</p> <pre><code>nvcc -L /soft/cuda-sdk/4.0.17/C/lib -I /soft/cuda-sdk/4.0.17/C/common/inc -lcutil_x86_64 -arch sm_13 -O3 -Xopencc "-Wall" Main.cu </code></pre> <p>The output is:</p> <pre><code>Using device 0: GeForce GTX 480 Cuda error in file 'Main.cu' in line 31 : unspecified launch failure. </code></pre> <p>EDIT: I have now further simplified the code. The following simpler code still produces the error message:</p> <pre><code>#include &lt;cutil.h&gt; __global__ void compute_energy(void *data) { *(double*)((int*)data+101) = 0.0; } int main(int argc, char *argv[]) { void *device_data; CUT_DEVICE_INIT(argc, argv); CUDA_SAFE_CALL(cudaMalloc(&amp;device_data, 101*sizeof(int)+sizeof(double))); compute_energy&lt;&lt;&lt;dim3(1, 1, 1), dim3(1, 1, 1)&gt;&gt;&gt;(device_data); CUDA_SAFE_CALL(cudaThreadSynchronize()); } </code></pre> <p>Now it is easy to see that the offset should be valid. I tried running cuda-memcheck and it says the following:</p> <pre><code>========= CUDA-MEMCHECK Using device 0: GeForce GTX 480 Cuda error in file 'Main.cu' in line 13 : unspecified launch failure. ========= Invalid __global__ write of size 8 ========= at 0x00000020 in compute_energy ========= by thread (0,0,0) in block (0,0,0) ========= Address 0x200200194 is misaligned ========= ========= ERROR SUMMARY: 1 error </code></pre> <p>I tried searching the internet to find what is meant by the address being misaligned, but I failed to find an explanation. What is the deal?</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload