Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>Spend some time focusing on the copious documentation offered by NVIDIA.</p> <p>From the Programming Guide:</p> <pre><code>float* devPtr; cudaMalloc((void**)&amp;devPtr, 256 * sizeof(*devPtr)); cudaMemset(devPtr, 0, 256 * sizeof(*devPtr)); </code></pre> <p>That's a simple example of how to allocate memory. Now, in your kernels, you should accept a pointer to a float like so:</p> <pre><code>__global__ void kernel1(float *some_neat_data) { some_neat_data[threadIdx.x]++; } __global__ void kernel2(float *potentially_that_same_neat_data) { potentially_that_same_neat_data[threadIdx.x] *= 0.3f; } </code></pre> <p>So now you can invoke them like so:</p> <pre><code>float* devPtr; cudaMalloc((void**)&amp;devPtr, 256 * sizeof(*devPtr)); cudaMemset(devPtr, 0, 256 * sizeof(*devPtr)); kernel1&lt;&lt;&lt;1,128&gt;&gt;&gt;(devPtr); kernel2&lt;&lt;&lt;1,128&gt;&gt;&gt;(devPtr); </code></pre> <blockquote> <p>As this data is used in numerous functions, I would like it to be global.</p> </blockquote> <p>There are few good reasons to use globals. This definitely is not one. I'll leave it as an exercise to expand this example to include moving "devPtr" to a global scope.</p> <p>EDIT:</p> <p>Ok, the fundamental problem is this: your kernels can only access device memory and the only global-scope pointers that they can use are GPU ones. When calling a kernel from your CPU, behind the scenes what happens is that the pointers and primitives get copied into GPU registers and/or shared memory before the kernel gets executed.</p> <p>So the closest I can suggest is this: use cudaMemcpyToSymbol() to achieve your goals. But, in the background, consider that a different approach might be the Right Thing.</p> <pre><code>#include &lt;algorithm&gt; __constant__ float devPtr[1024]; __global__ void kernel1(float *some_neat_data) { some_neat_data[threadIdx.x] = devPtr[0] * devPtr[1]; } __global__ void kernel2(float *potentially_that_same_neat_data) { potentially_that_same_neat_data[threadIdx.x] *= devPtr[2]; } int main(int argc, char *argv[]) { float some_data[256]; for (int i = 0; i &lt; sizeof(some_data) / sizeof(some_data[0]); i++) { some_data[i] = i * 2; } cudaMemcpyToSymbol(devPtr, some_data, std::min(sizeof(some_data), sizeof(devPtr) )); float* otherDevPtr; cudaMalloc((void**)&amp;otherDevPtr, 256 * sizeof(*otherDevPtr)); cudaMemset(otherDevPtr, 0, 256 * sizeof(*otherDevPtr)); kernel1&lt;&lt;&lt;1,128&gt;&gt;&gt;(otherDevPtr); kernel2&lt;&lt;&lt;1,128&gt;&gt;&gt;(otherDevPtr); return 0; } </code></pre> <p>Don't forget '--host-compilation=c++' for this example.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload