Note that there are some explanatory texts on larger screens.

plurals
  1. POPyOpenCl benchmark questions
    primarykey
    data
    text
    <p>I was a little modified the standard code from <a href="https://github.com/inducer/pyopencl/blob/master/examples/benchmark-all.py" rel="nofollow">https://github.com/inducer/pyopencl/blob/master/examples/benchmark-all.py</a></p> <p>Replaced by numbers, the variable zz</p> <pre><code>import pyopencl as cl import numpy import numpy.linalg as la import datetime from time import time zz=100 a = numpy.random.rand(zz).astype(numpy.float32) b = numpy.random.rand(zz).astype(numpy.float32) c_result = numpy.empty_like(a) # Speed in normal CPU usage time1 = time() for i in range(zz): for j in range(zz): c_result[i] = a[i] + b[i] c_result[i] = c_result[i] * (a[i] + b[i]) c_result[i] = c_result[i] * (a[i] / 2) time2 = time() print("Execution time of test without OpenCL: ", time2 - time1, "s") for platform in cl.get_platforms(): for device in platform.get_devices(): print("===============================================================") print("Platform name:", platform.name) print("Platform profile:", platform.profile) print("Platform vendor:", platform.vendor) print("Platform version:", platform.version) print("---------------------------------------------------------------") print("Device name:", device.name) print("Device type:", cl.device_type.to_string(device.type)) print("Device memory: ", device.global_mem_size//1024//1024, 'MB') print("Device max clock speed:", device.max_clock_frequency, 'MHz') print("Device compute units:", device.max_compute_units) # Simnple speed test ctx = cl.Context([device]) queue = cl.CommandQueue(ctx, properties=cl.command_queue_properties.PROFILING_ENABLE) mf = cl.mem_flags a_buf = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=a) b_buf = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=b) dest_buf = cl.Buffer(ctx, mf.WRITE_ONLY, b.nbytes) prg = cl.Program(ctx, """ __kernel void sum(__global const float *a, __global const float *b, __global float *c) { int loop; int gid = get_global_id(0); for(loop=0; loop&lt;%s;loop++) { c[gid] = a[gid] + b[gid]; c[gid] = c[gid] * (a[gid] + b[gid]); c[gid] = c[gid] * (a[gid] / 2); } } """ % (zz)).build() exec_evt = prg.sum(queue, a.shape, None, a_buf, b_buf, dest_buf) exec_evt.wait() elapsed = 1e-9*(exec_evt.profile.end - exec_evt.profile.start) print("Execution time of test: %g s" % elapsed) c = numpy.empty_like(a) cl.enqueue_read_buffer(queue, dest_buf, c).wait() error = 0 for i in range(zz): if c[i] != c_result[i]: error = 1 if error: print("Results doesn't match!!") else: print("Results OK") </code></pre> <p>If zz=100 i have:</p> <pre><code>('Execution time of test without OpenCL: ', 0.10500001907348633, 's') =============================================================== ('Platform name:', 'AMD Accelerated Parallel Processing') ('Platform profile:', 'FULL_PROFILE') ('Platform vendor:', 'Advanced Micro Devices, Inc.') ('Platform version:', 'OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213)') --------------------------------------------------------------- ('Device name:', 'Cypress\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00') ('Device type:', 'GPU') ('Device memory: ', 800, 'MB') ('Device max clock speed:', 850, 'MHz') ('Device compute units:', 20) Execution time of test: 0.00168922 s Results OK =============================================================== ('Platform name:', 'AMD Accelerated Parallel Processing') ('Platform profile:', 'FULL_PROFILE') ('Platform vendor:', 'Advanced Micro Devices, Inc.') ('Platform version:', 'OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213)') --------------------------------------------------------------- ('Device name:', 'Intel(R) Core(TM) i5 CPU 750 @ 2.67GHz\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00') ('Device type:', 'CPU') ('Device memory: ', 8183L, 'MB') ('Device max clock speed:', 3000, 'MHz') ('Device compute units:', 4) Execution time of test: 4.369e-05 s Results OK </code></pre> <p>We have 3 time:</p> <pre><code>normal ('Execution time of test without OpenCL: ', 0.10500001907348633, 's') pyopencl radeon 5870: Execution time of test: 0.00168922 s pyopencl i5 CPU 750: Execution time of test: 4.369e-05 s </code></pre> <p>First questions pack: what is pyopencl i5 CPU 750? why he faster "normal"('Execution time of test without OpenCL) in 250 times? and why he faster "pyopencl radeon 5870" in ~38 times?</p> <p>If zz=1000 we have:</p> <pre><code>normal ('Execution time of test without OpenCL: ', 9.05299997329712, 's') pyopencl radeon 5870:Execution time of test: 0.0104431 s pyopencl i5 CPU 750: Execution time of test: 0.00238112 s </code></pre> <p>i5*5=radeon5870</p> <p>i5*3800=normal</p> <p>If zz=10000 </p> <pre><code>normal its to long... comment code... redeon58700, Execution time of test: 0.085571 s i5, Execution time of test: 0.261854 s </code></pre> <p>Here we see how to win video card.</p> <p>Still very interesting to compare the sequence of times results. normal_stage1*90=normal_stage2 normal_stage2*~95=normal_stage3(based on experience) </p> <p>i5_stage1*52=i5_stage2 i5_stage2*109=i5_stage3</p> <p>radeon5870_stage1*6=radeon_stage2 radeon_stage2*8=radeon_stage3</p> <p>Сan somebody explain why the results opencl growth has not been a linear?</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload