Note that there are some explanatory texts on larger screens.

plurals
  1. POSimulating packusdw functionality with SSE2
    text
    copied!<p>I'm implementing a fast x888 -> 565 pixel conversion function in <a href="http://cgit.freedesktop.org/pixman/" rel="noreferrer">pixman</a> according to the algorithm described <a href="http://software.intel.com/sites/landingpage/legacy/mmx/MMX_App_24-16_Bit_Conversion.pdf" rel="noreferrer">by Intel [pdf]</a>. Their code converts x888 -> 555 while I want to convert to 565. Unfortunately, converting to 565 means that the high bit is set, which means I can't use signed-saturation pack instructions. The unsigned pack instruction, packusdw wasn't added until SSE4.1. I'd like to implement its functionality with SSE2 or find another way of doing this.</p> <p>This function takes two XMM registers containing 4 32-bit pixels each and outputs a single XMM register containing the 8 converted RGB565 pixels.</p> <pre><code>static force_inline __m128i pack_565_2packedx128_128 (__m128i lo, __m128i hi) { __m128i rb0 = _mm_and_si128 (lo, mask_565_rb); __m128i rb1 = _mm_and_si128 (hi, mask_565_rb); __m128i t0 = _mm_madd_epi16 (rb0, mask_565_pack_multiplier); __m128i t1 = _mm_madd_epi16 (rb1, mask_565_pack_multiplier); __m128i g0 = _mm_and_si128 (lo, mask_green); __m128i g1 = _mm_and_si128 (hi, mask_green); t0 = _mm_or_si128 (t0, g0); t1 = _mm_or_si128 (t1, g1); t0 = _mm_srli_epi32 (t0, 5); t1 = _mm_srli_epi32 (t1, 5); /* XXX: maybe there's a way to do this relatively efficiently with SSE2? */ return _mm_packus_epi32 (t0, t1); } </code></pre> <p>Ideas I've thought of:</p> <ul> <li><p>Subtracting 0x8000, _mm_packs_epi32, re-adding 0x8000 to each 565 pixel. I've tried this, but I can't make this work.</p> <pre><code>t0 = _mm_sub_epi16 (t0, mask_8000); t1 = _mm_sub_epi16 (t1, mask_8000); t0 = _mm_packs_epi32 (t0, t1); return _mm_add_epi16 (t0, mask_8000); </code></pre></li> <li><p>Shuffle data instead of packing it. Works for MMX, but since SSE 16-bit shuffles work on only the high or low 64-bits, it would get messy.</p></li> <li><p>Save high bits, set them to zero, do the pack, restore them afterwards. Seems quite messy.</p></li> </ul> <p>Is there some other (hopefully more efficient) way I could do this?</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload