Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>This is a perfect problem for vectorization. As Jake noted, vectorization is sometimes avoided because it’s a pain to have separate code paths for each architecture. In an ideal world, compilers would successfully autovectorize all of these cases and it wouldn’t be an issue. In the meantime, there are a few other options available.</p> <p>If you’re targeting iOS / OSX and can limit yourself to clang, the nicest solution for simple loops like these is to use clang “extended vectors”; these let you write vector code that works across architectures:</p> <pre><code>typedef int vector_int __attribute__((ext_vector_type(4),aligned(4))); const int ints_per_vector = 4; int Add_8K_3(int *in, int* out, int b) { vector_int *vin = (vector_int *)in; vector_int *vout = (vector_int *)out; for (int i=0; i&lt;1024/ints_per_vector; i++) vout[i] = vin[i] + b; return 0; } </code></pre> <p>This generates decent (not perfect) vector code for all the architectures that clang supports. E.g. armv7s:</p> <pre><code>0: adds r3, r0, r2 vld1.32 {d18, d19}, [r3] adds r3, r1, r2 adds r2, #0x10 cmp.w r2, #0x1000 vadd.i32 q9, q9, q8 vst1.32 {d18, d19}, [r3] bne 0b </code></pre> <p>arm64:</p> <pre><code>0: ldr q1, [x0, x8, lsl #4] add.4s v1, v1, v0 str q1, [x1, x8, lsl #4] add x8, x8, 1 cmp w8, #256 b.ne 0b </code></pre> <p>x86_64:</p> <pre><code>0: movdqu (%rdi,%rax), %xmm1 paddd %xmm0, %xmm1 movdqu %xmm1, (%rsi,%rax) add $0x10, %rax cmp $0x1000, %eax jne 0b </code></pre> <p>If you need your code to be portable to other compilers, however, it’s preferable to use intrinsics or rely on compiler optimization instead, and if you really need your code to go absolutely as fast as possible, then some amount of hand-tuning is unavoidable.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload