Note that there are some explanatory texts on larger screens.

plurals
  1. POGPU programming via JOCL uses only 6 out of 80 shader cores?
    primarykey
    data
    text
    <p>I am trying to let a program run on my GPU and to start with an easy sample I modified the first sample on <a href="http://www.jocl.org/samples/samples.html" rel="nofollow">http://www.jocl.org/samples/samples.html</a> and to run the following little script: I run n simultaneous "threads" (what's the correct name for the GPU equivalent of a thread?), each of which performs 20000000/n independent tanh()-computations. You can see my code here: <a href="http://pastebin.com/DY2pdJzL" rel="nofollow">http://pastebin.com/DY2pdJzL</a></p> <p>The speed is by far not what I expected:</p> <ul> <li>for n=1 it takes 12.2 seconds</li> <li>for n=2 it takes 6.3 seconds</li> <li>for n=3 it takes 4.4 seconds</li> <li>for n=4 it takes 3.4 seconds</li> <li>for n=5 it takes 3.1 seconds</li> <li>for n=6 and beyond, it takes 2.7 seconds. </li> </ul> <p>So after n=6 (be it n=8, n=20, n=100, n=1000 or n=100000), there is no performance increase, which means only 6 of these are computed in parallel. However, according to the specifications of my card there should be 80 cores: <a href="http://www.amd.com/us/products/desktop/graphics/ati-radeon-hd-5000/hd-5450-overview/pages/hd-5450-overview.aspx#2" rel="nofollow">http://www.amd.com/us/products/desktop/graphics/ati-radeon-hd-5000/hd-5450-overview/pages/hd-5450-overview.aspx#2</a></p> <p>It is not a matter of overhead, since increasing or decreasing the 20000000 only matters a linear factor in all the execution times.</p> <p>I have installed the AMD APP SDK and drivers that support OpenCL: see <a href="http://dl.dropbox.com/u/3060536/prtscr.png" rel="nofollow">http://dl.dropbox.com/u/3060536/prtscr.png</a> and <a href="http://dl.dropbox.com/u/3060536/prtsrc2.png" rel="nofollow">http://dl.dropbox.com/u/3060536/prtsrc2.png</a> for details (or at least I conclude from these that OpenCL is running correctly).</p> <p>So I'm a bit clueless now, where to search for answer. Why can JOCL only do 6 parallel executions on my ATI Radeon HD 5450?</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload