StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POPyOpenCl benchmark questions
primarykey
Id
7376616
data
AcceptedAnswerId
7379085
AnswerCount
1
ClosedDate
CommentCount
3
CommunityOwnedDate
CreationDate
2011-09-11T06:07:59.147
FavoriteCount
0
LastActivityDate
2011-09-11T19:14:11.010
LastEditDate
LastEditorUserId
0
OwnerUserId
824088
ParentId
0
PostTypeId
1
Score
3
ViewCount
2385
LastEditorDisplayName
text
Body
I was a little modified the standard code from <a href="https://github.com/inducer/pyopencl/blob/master/examples/benchmark-all.py" rel="nofollow">https://github.com/inducer/pyopencl/blob/master/examples/benchmark-all.py</a> Replaced by numbers, the variable zz <pre><code>import pyopencl as cl import numpy import numpy.linalg as la import datetime from time import time zz=100 a = numpy.random.rand(zz).astype(numpy.float32) b = numpy.random.rand(zz).astype(numpy.float32) c_result = numpy.empty_like(a) # Speed in normal CPU usage time1 = time() for i in range(zz): for j in range(zz): c_result[i] = a[i] + b[i] c_result[i] = c_result[i] * (a[i] + b[i]) c_result[i] = c_result[i] * (a[i] / 2) time2 = time() print("Execution time of test without OpenCL: ", time2 - time1, "s") for platform in cl.get_platforms(): for device in platform.get_devices(): print("===============================================================") print("Platform name:", platform.name) print("Platform profile:", platform.profile) print("Platform vendor:", platform.vendor) print("Platform version:", platform.version) print("---------------------------------------------------------------") print("Device name:", device.name) print("Device type:", cl.device_type.to_string(device.type)) print("Device memory: ", device.global_mem_size//1024//1024, 'MB') print("Device max clock speed:", device.max_clock_frequency, 'MHz') print("Device compute units:", device.max_compute_units) # Simnple speed test ctx = cl.Context([device]) queue = cl.CommandQueue(ctx, properties=cl.command_queue_properties.PROFILING_ENABLE) mf = cl.mem_flags a_buf = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=a) b_buf = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=b) dest_buf = cl.Buffer(ctx, mf.WRITE_ONLY, b.nbytes) prg = cl.Program(ctx, """ __kernel void sum(__global const float *a, __global const float *b, __global float *c) { int loop; int gid = get_global_id(0); for(loop=0; loop<%s;loop++) { c[gid] = a[gid] + b[gid]; c[gid] = c[gid] * (a[gid] + b[gid]); c[gid] = c[gid] * (a[gid] / 2); } } """ % (zz)).build() exec_evt = prg.sum(queue, a.shape, None, a_buf, b_buf, dest_buf) exec_evt.wait() elapsed = 1e-9*(exec_evt.profile.end - exec_evt.profile.start) print("Execution time of test: %g s" % elapsed) c = numpy.empty_like(a) cl.enqueue_read_buffer(queue, dest_buf, c).wait() error = 0 for i in range(zz): if c[i] != c_result[i]: error = 1 if error: print("Results doesn't match!!") else: print("Results OK") </code></pre> If zz=100 i have: <pre><code>('Execution time of test without OpenCL: ', 0.10500001907348633, 's') =============================================================== ('Platform name:', 'AMD Accelerated Parallel Processing') ('Platform profile:', 'FULL_PROFILE') ('Platform vendor:', 'Advanced Micro Devices, Inc.') ('Platform version:', 'OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213)') --------------------------------------------------------------- ('Device name:', 'Cypress\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00') ('Device type:', 'GPU') ('Device memory: ', 800, 'MB') ('Device max clock speed:', 850, 'MHz') ('Device compute units:', 20) Execution time of test: 0.00168922 s Results OK =============================================================== ('Platform name:', 'AMD Accelerated Parallel Processing') ('Platform profile:', 'FULL_PROFILE') ('Platform vendor:', 'Advanced Micro Devices, Inc.') ('Platform version:', 'OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213)') --------------------------------------------------------------- ('Device name:', 'Intel(R) Core(TM) i5 CPU 750 @ 2.67GHz\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00') ('Device type:', 'CPU') ('Device memory: ', 8183L, 'MB') ('Device max clock speed:', 3000, 'MHz') ('Device compute units:', 4) Execution time of test: 4.369e-05 s Results OK </code></pre> We have 3 time: <pre><code>normal ('Execution time of test without OpenCL: ', 0.10500001907348633, 's') pyopencl radeon 5870: Execution time of test: 0.00168922 s pyopencl i5 CPU 750: Execution time of test: 4.369e-05 s </code></pre> First questions pack: what is pyopencl i5 CPU 750? why he faster "normal"('Execution time of test without OpenCL) in 250 times? and why he faster "pyopencl radeon 5870" in ~38 times? If zz=1000 we have: <pre><code>normal ('Execution time of test without OpenCL: ', 9.05299997329712, 's') pyopencl radeon 5870:Execution time of test: 0.0104431 s pyopencl i5 CPU 750: Execution time of test: 0.00238112 s </code></pre> i5*5=radeon5870 i5*3800=normal If zz=10000 <pre><code>normal its to long... comment code... redeon58700, Execution time of test: 0.085571 s i5, Execution time of test: 0.261854 s </code></pre> Here we see how to win video card. Still very interesting to compare the sequence of times results. normal_stage1*90=normal_stage2 normal_stage2*~95=normal_stage3(based on experience) i5_stage1*52=i5_stage2 i5_stage2*109=i5_stage3 radeon5870_stage1*6=radeon_stage2 radeon_stage2*8=radeon_stage3 Сan somebody explain why the results opencl growth has not been a linear?
Tags
<python><benchmarking><opencl>
Title
PyOpenCl benchmark questions
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. This table or related slice is empty.
UserOwnerUserId
1. USEcheg
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POPyOpenCl benchmark questions
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 POPyOpenCl benchmark questions
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 POPyOpenCl benchmark questions
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.