Note that there are some explanatory texts on larger screens.

plurals
  1. POWhat is the best way to implement a small lookup table in an OpenCL Kernel
    primarykey
    data
    text
    <p>In my kernel it is necessary to make a large number of random accesses to a small lookup table (only 8 32-bit integers). Each kernel has a unique lookup table. Below is a simplified version of the kernel to illustrate how the lookup table is used.</p> <pre><code>__kernel void some_kernel( __global uint* global_table, __global uint* X, __global uint* Y) { size_t gsi = get_global_size(0); size_t gid = get_global_id(0); __private uint LUT[8]; // 8 words of of global_table is copied to LUT // Y is assigned a value from the lookup table based on the current value of X for (size_t i = 0; i &lt; n; i++) { Y[i*gsi+gid] = LUT[X[i*gsi+gid]]; } } </code></pre> <p>Because of the small size, I am getting the best performance by keeping the table in the __private memory space. However, because of the random nature in which the lookup table is accessed, there is still a large performance hit. With the lookup table code removed (replaced with a simple arithmetic operation, for example), although the kernel would provide the wrong answer, the performance improves by a factor of over 3.</p> <p>Is there a better way? Have I overlooked some OpenCL feature that provides efficient random access for very small chunks of memory? Could there be an efficient solution using vector types?</p> <p>[edit] Note, that the maximum value of X is 7, but the maximum value of Y is as large as 2^32-1. In other words, all the bits of the lookup table are being used, so it cannot be packed into a smaller representation.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload