Note that there are some explanatory texts on larger screens.

plurals
  1. POBit popcount for large buffer, with Core 2 CPU (SSSE3)
    primarykey
    data
    text
    <p>I'm looking for the fastest way to popcount on large buffer of 512 or more bytes. I can guarantee any required alignment, and the buffer size is always a power of 2. The buffer corresponds to block allocations, so typically the bits are either all set, none set, or mostly set favoring the "left" of the buffer, with occasional holes.</p> <p>Some solutions I've considered are:</p> <ul> <li>GCC's <a href="http://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html" rel="nofollow noreferrer"><code>__builtin_popcount</code></a></li> <li><a href="http://www.dalkescientific.com/writings/diary/archive/2008/07/05/bitslice_and_popcount.html" rel="nofollow noreferrer">Bitslice <code>popcount_24words</code></a></li> <li><a href="http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetKernighan" rel="nofollow noreferrer">Counting bits set, Brian Kernighan's way</a></li> </ul> <p>I'm interested in the fastest solution, it must work on 32bit x86 chipset belonging to core2 or more recent. SSE and SIMD are of great interest. I'll be testing on the following quad core CPU:</p> <pre><code>matt@stanley:~/anacrolix/public/stackoverflow$ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz stepping : 11 cpu MHz : 1600.000 cache size : 4096 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 4 apicid : 0 initial apicid : 0 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm tpr_shadow vnmi flexpriority bogomips : 4800.21 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload