Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>First of all, unless your function is <strong>really</strong> time-sensitive, do not try to over-optimize. Just use the one you provided, as it is easy to verify for correctness and it doesn't try to be smart for just the heck of it.</p> <p>If the function needs really to be <strong>fast</strong> then there are many ways to optimize it more. Many, really many ways. Some of them either expect or assume specific memory layout of the strings you have (for example, that they are allocated on word boundaries and the allocation is also always padded up to word boundary). So you'd need to be careful, as the algorithm might work on some combination of processor, compiler and memory allocator and fail miserably on others.</p> <p>Just for the heck of it, I'll list some possible ways to speed up the character counter:</p> <ul> <li>Read the string a word (32 or 64 bit integer) at a time. Not necessarily much of a help thanks to L1 caching and speculative/out-of-order execution. This needs end-of-loop adjustment for the last word (miscounting bytes after NUL terminator). Use only with word-aligned and padded memory allocators.</li> <li>Remove the conditional, and instead calculate counts for all characters (to an array) and return the count for the wanted character. (This will remove the conditional and if you know string length in advance it makes for excellent loop unrolling plus removes one point of conditional branching.)</li> <li>If you know the length of the string beforehand (calculated somewhere else) you can use that to unroll the loop. Or better, write it as a for-loop and apply a suitable #pragma and compiler options to make the compiler do loop unrolling for you.</li> <li>Write the routine in assembler. <em>Before</em> going this way, crank up all compiler optimizations and disassemble the routine first -- you are likely to find out that the compiler already used all potential tricks you knew and several you didn't.</li> <li>If your string is potentially very large (megabytes) -- and here I am speculating -- using a graphics card via OpenCL/CUDA might offer some potential.</li> </ul> <p>And so on.</p> <p>But I really, <em>really suggest</em> you stick with the one you have if you have a real-world problem. If this is a toy problem and you are optimizing for the fun of it, go ahead. </p> <p>Cycle-shaving is a fun way to learn CPUs and instructions sets, but for 99.999999...% of programming tasks it is not worth the effort.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload