Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Two hints:</p> <ol> <li>Change the order of your double loop iterations: First iterate in y (outer loop), then in x (inner loop). This is the most important fix, because it applies to any double loop you will ever implement. You want to make sure that successive reads are as close to each other as possible (there are different reasons for this on different devices, e.g. for single-threaded CPU code its most important for caching, on GPUs its most important for coalesced memory access, and maybe caching). Since you are iterating over rows first, you have 0 coalescing right now, effectively sending a single 2-byte (pixel width) request per pixel. <a href="https://stackoverflow.com/questions/18538701/median-selection-in-cuda-kernel/19649875#19649875">Make sure to read this thread</a> on the matter, even though it only explains the CPU side of things.</li> <li>Make sure, your reads are coalesced. In your example, even if you fixed your loop, you would only be reading <code>block_width * pixel_width</code>, i.e. 16 x 2 = 32 contiguous bytes at a time. This will require higher occupancy for latency hiding than reading 128 bytes at a time. You can improve things by using wider blocks (wider blocks are generally better for that very reason). Also make sure your reads are aligned. This is related to the previous point and explained in [this section of the CUDA C Programming Guide][7].</li> </ol> <p>EDIT: I moved the rest of the answer <a href="https://stackoverflow.com/questions/18538701/median-selection-in-cuda-kernel/19649875#19649875">here</a>.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload