Note that there are some explanatory texts on larger screens.

plurals
  1. POMeasuring time using pycuda.driver.Event gives wrong results
    primarykey
    data
    text
    <p>I ran <a href="http://wiki.tiker.net/PyCuda/Examples/SimpleSpeedTest" rel="nofollow">SimpleSpeedTest.py</a> from the PyCuda examples, producing the following output:</p> <pre><code>Using nbr_values == 8192 Calculating 100000 iterations SourceModule time and first three results: 0.058294s, [ 0.005477 0.005477 0.005477] Elementwise time and first three results: 0.102527s, [ 0.005477 0.005477 0.005477] Elementwise Python looping time and first three results: 2.398071s, [ 0.005477 0.005477 0.005477] GPUArray time and first three results: 8.207257s, [ 0.005477 0.005477 0.005477] CPU time measured using : 0.000002s, [ 0.005477 0.005477 0.005477] </code></pre> <p><strong>The first four time measurements are reasonable, the last one (0.000002s) however is way off</strong>. The CPU result should be the slowest one but it is orders of magnitude faster than the fastest GPU method. So obviously the measured time must be wrong. This is strange since the same timing method seems to work fine for the first four results.</p> <p>So I took some code from SimpleSpeedTest.py and made a small <strong>test file</strong> [2], which produced:</p> <pre><code>time measured using option 1: 0.000002s time measured using option 2: 5.989620s </code></pre> <p><strong>Option 1</strong> measures the duration using <code>pycuda.driver.Event.record()</code> (as in SimpleSpeedTest.py), <strong>option 2</strong> uses <code>time.clock()</code>. Again, option 1 is off while option 2 gives a reasonable result (the time it takes to run the test file is around 6s).</p> <p>Does anyone have an idea as to why this is happening? </p> <p>Since using option 1 is endorsed in SimpleSpeedTest.py, could it be my setup that is causing the problem? I am running a GTX 470, Display Driver 301.42, CUDA 4.2, Python 2.7 64, PyCuda 2012.1, X5650 Xeon</p> <p>[2] <strong>Test file:</strong></p> <pre><code>import numpy import time import pycuda.driver as drv import pycuda.autoinit n_iter = 100000 nbr_values = 8192 # = 64 * 128 (values as used in SimpleSpeedTest.py) start = drv.Event() # option 1 uses pycuda.driver.Event end = drv.Event() a = numpy.ones(nbr_values).astype(numpy.float32) # test data start.record() # start option 1 (inserting recording points into GPU stream) tic = time.clock() # start option 2 (using CPU time) for i in range(n_iter): a = numpy.sin(a) # do some work end.record() # end option 1 toc = time.clock() # end option 2 end.synchronize() events_secs = start.time_till(end)*1e-3 time_secs = toc - tic print "time measured using option 1:" print "%fs " % events_secs print "time measured using option 2:" print "%fs " % time_secs </code></pre>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload