Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>One approach is to flatten your 2D array and handle it in a 1D fashion with pointer arithmetic to handle the row and column dimensions. First of all in your struct definition, replace the antenna_pattern element with:</p> <pre><code>struct LR { . . float *antenna_pattern; } LR; </code></pre> <p>Then you will need to do a host-side malloc to allocate space:</p> <pre><code>#define COL 1001 #define ROW 361 #define DSIZE (ROW*COL) LR.antenna_pattern = (float *)malloc(DSIZE*sizeof(float)); </code></pre> <p>And a device side cuda malloc:</p> <pre><code>float *d_antenna_pattern; cudaMalloc((void **) &amp;d_antenna_pattern, DSIZE*sizeof(float)); </code></pre> <p>The copy to the device looks like:</p> <pre><code>cudaMemcpy(d_antenna_pattern, LR.antenna_pattern, DSIZE*sizeof(float), cudaMemcpyHostToDevice); </code></pre> <p>When you want to reference into these arrays, you will have to do pointer arithmetic like:</p> <pre><code>float my_val_xy = ap[(x*COL)+y]; // to access element at [x][y] on the device float my_val_xy = LR.antenna_pattern[(x*COL)+y]; // on the host </code></pre> <p>If you want to maintain the 2D array subscripts throughout, you can do this with an appropriate typedef. For an example, see the first code sample in my answer to <a href="https://stackoverflow.com/questions/12924155/sending-3d-array-to-cuda-kernel/12925014#12925014">this question</a>. To diagram this out, you would need to start with a typedef:</p> <pre><code>#define COL 1001 #define ROW 361 #define DSIZE (ROW*COL) typedef float aParray[COL]; </code></pre> <p>and modify your structure definition:</p> <pre><code>struct LR { . . aParray *antenna_pattern; } LR; </code></pre> <p>The host side malloc would look like:</p> <pre><code>LR.antenna_pattern = (aParray *)malloc(DSIZE*sizeof(float)); </code></pre> <p>The device side cuda malloc would look like:</p> <pre><code>aParray *d_antenna_pattern; cudaMalloc((void **) &amp;d_antenna_pattern, DSIZE*sizeof(float)); </code></pre> <p>The copy to the device looks like:</p> <pre><code>cudaMemcpy(d_antenna_pattern, LR.antenna_pattern, DSIZE*sizeof(float), cudaMemcpyHostToDevice); </code></pre> <p>The device kernel definition will need a function parameter like:</p> <pre><code>__global__ void myKernel(float ap[][COL]) { </code></pre> <p>Then inside the kernel you can access an element at x,y as:</p> <pre><code>float my_val_xy = ap[x][y]; </code></pre> <p>Now in response to a follow-up question asking what to do if LR cannot be changed, here is a complete sample code which combines some of these ideas without modifying the LR structure:</p> <pre><code>#include&lt;stdio.h&gt; // for cuda error checking #define cudaCheckErrors(msg) \ do { \ cudaError_t __err = cudaGetLastError(); \ if (__err != cudaSuccess) { \ fprintf(stderr, "Fatal error: %s (%s at %s:%d)\n", \ msg, cudaGetErrorString(__err), \ __FILE__, __LINE__); \ fprintf(stderr, "*** FAILED - ABORTING\n"); \ return 1; \ } \ } while (0) struct LR { int foo; float antenna_pattern[361][1001]; } LR; __global__ void mykernel(float ap[][1001]){ int tid = threadIdx.x + (blockDim.x*blockIdx.x); float myval = 0.0; if (tid == 0){ for (int i=0; i&lt;361; i++) for (int j=0; j&lt;1001; j++) ap[i][j] = myval++; } } int main(){ typedef float aParray[1001]; aParray *d_antenna_pattern; cudaMalloc((void **) &amp;d_antenna_pattern, (361*1001)*sizeof(float)); cudaCheckErrors("cudaMalloc fail"); float *my_ap_ptr; my_ap_ptr = &amp;(LR.antenna_pattern[0][0]); for (int i=0; i&lt; 361; i++) for (int j=0; j&lt;1001; j++) LR.antenna_pattern[i][j] = 0.0; cudaMemcpy(d_antenna_pattern, my_ap_ptr, (361*1001)*sizeof(float), cudaMemcpyHostToDevice); cudaCheckErrors("cudaMemcpy fail"); mykernel&lt;&lt;&lt;1,1&gt;&gt;&gt;(d_antenna_pattern); cudaCheckErrors("Kernel fail"); cudaMemcpy(my_ap_ptr, d_antenna_pattern, (361*1001)*sizeof(float), cudaMemcpyDeviceToHost); cudaCheckErrors("cudaMemcpy 2 fail"); float myval = 0.0; for (int i=0; i&lt;361; i++) for (int j=0; j&lt;1001; j++) if (LR.antenna_pattern[i][j] != myval++) {printf("mismatch at offset x: %d y: %d actual: %f expected: %f\n", i, j, LR.antenna_pattern[i][j], --myval); return 1;} printf("Results match!\n"); return 0; } </code></pre> <p>If you prefer to use the flattened method, replace the <code>d_antenna_pattern</code> definition with:</p> <pre><code>float *d_antenna_pattern; </code></pre> <p>And change the kernel function parameter correspondingly to:</p> <pre><code>__global__ void mykernel(float *ap){ </code></pre> <p>Then access using the pointer arithmetic method in the kernel:</p> <pre><code>ap[(i*1001)+j] = myval++; </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload