Note that there are some explanatory texts on larger screens.

plurals
  1. POProper way to write kernel functions in CUDA?
    text
    copied!<p>I am just about to embark on converting a program I wrote into CUDA to hopefully increase processing speed.</p> <p>Now obviously my old program executes many functions one after the other, and I have separated these functions in my main program and call each one in order.</p> <pre><code>void main () { *initialization of variables* function1() function2() function3() print result; } </code></pre> <p>These functions are inherently serial, as funtion2 is dependent on the results of funtion1.</p> <p>Alright, so now I want to convert these functions into kernels, and run the tasks in the functions in parallel.</p> <p>Is it as simple as rewriting each function in a parallel way, and then in my main program, call each kernel one after the other? Is this slower than it needs to be? For example can I have my GPU directly execute the next parallel operation without going back to the CPU to initialize the next kernel?</p> <p>Obviously I will keep all run time variables on the GPU memory to limit the amount of data transfer going on, so should I even worry about the time it takes between kernel calls?</p> <p>I hope this question is clear, if not please ask me to elaborate. Thanks.</p> <p>And here is an extra question so that I can check my sanity. Ultimately this program's input is a video file, and through the different functions, each frame will lead to a result. My plan is to grab multiple frames at a time (say 8 unique frames) and then divide the total number of blocks I have among these 8 frames, and then the multiple threads in the blocks will be doing even more parallel operations on the image data, such as vector addition, Fourier transforms, etc. <br>Is this the right way to approach the problem?</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload