Note that there are some explanatory texts on larger screens.

plurals
  1. POvectorize a loop which accesses non-consecutive memory locations
    primarykey
    data
    text
    <p>I have a loop of this structure</p> <p><strong>Reference : <a href="http://sc.tamu.edu/help/power/powerlearn/html/ScalarOptnw/sld018.htm" rel="nofollow">Maxwell Code Example</a></strong></p> <pre><code>do z=1,zend do y=1,yend do x=1,xend k=arr(x,y,z) do while(k.ne.0) ix=fooX(k) iy=fooY(k) iz=fooZ(k) x1=x(ix ,iy ,iz) x2=x(ix+1,iy ,iz) x3=x(ix ,iy+1,iz) x4=x(ix+1,iy+1,iz) x5=x(ix ,iy ,iz+1) x6=x(ix+1,iy ,iz+1) x7=x(ix ,iy+1,iz+1) x8=x(ix+1,iy+1,iz+1) y1=y(ix ,iy ,iz) y2=y(ix+1,iy ,iz) y3=y(ix ,iy+1,iz) y4=y(ix+1,iy+1,iz) y5=y(ix ,iy ,iz+1) y6=y(ix+1,iy ,iz+1) y7=y(ix ,iy+1,iz+1) y8=y(ix+1,iy+1,iz+1) z1=z(ix ,iy ,iz) z2=z(ix+1,iy ,iz) z3=z(ix ,iy+1,iz) z4=z(ix+1,iy+1,iz) z5=z(ix ,iy ,iz+1) z6=z(ix+1,iy ,iz+1) z7=z(ix ,iy+1,iz+1) z8=z(ix+1,iy+1,iz+1) sumX+=x1+x2+..x8 sumY+=y1+y2+..y8 sumZ+=z1+z2+..z8 k=linkArr(k) enddo enddo enddo enddo </code></pre> <p>x1 through x8 are the 8 corners of a rectangular cuboid. There are three challenges to vectorize this code. One is that the 8 array elements are not contiguous in memory. Second is the inherent while loop structure along with linked List access. Third the values of ix, iy, iz returned from from fooX, fooY, fooZ are not not contiguous. So each iteration of the loop has a completely different set of ix, iy, iz. So the even across the iterations the memory access is scattered. I tried the following approaches: 1. unrolled the 3-level DO loops as :</p> <pre><code>do z=1,zend do y=1,yend do x=1,xend if(arr(x,y,z).NE.0) then kArr(indx)=arr(x,y,z) DO WHILE (kArr(indx).NE.0) indx = indx + 1 kArr(indx)=linkArr(kArr(indx-1)) ENDDO endif enddo enddo enddo </code></pre> <p>With this i have got rid of the while loop structure and now I'm able to run one big loop on kArr inside which i group 8 elements (say my VPU can accomodate 8 sets of data at a time). It did not give a performance improvement. I can post the details of these if anyone is interested. I need suggestions on how to optimize this code. Another option i tried was to combine x,y,z data in a single array so that when i compute x1, y1 &amp; z1 also will be in adjacent memory locations.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload