Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>If you insist on instrumenting binaries, pretty much your fastest coverage is the 5-byte jump-out jump-back trick. (You're covering standard ground for binary instrumentation tools.)</p> <p>The INT 3 solution will always involve a trap. Yes, you could handle the trap in your space instead of a debugger space and that would speed up it, but it will never be close to competitive to the jump-out/back patch. You may need it as backup anyway, if the function you are instrumenting happens to be shorter than 5 bytes (e.g., "inc eax/ret") because then you don't have 5 bytes you can patch.</p> <p>What you might do to optimize things a little is examine the patched code. Without such examination, with original code:</p> <pre><code> instrn 1 instrn 2 instrn N next: </code></pre> <p>patched, in general to look like this:</p> <pre><code> jmp patch xxx next: </code></pre> <p>has to generally have a patch:</p> <pre><code> patch: pushf inc count popf instrn1 instrn2 instrnN jmp back </code></pre> <p>If all you want is <em>coverage</em>, you don't need to increment, and the means you don't need to save the flags:</p> <pre><code> patch: mov byte ptr covered,1 instrn1 instrn2 instrnN jmp back </code></pre> <p>You should use a <em>byte</em> rather than a word to keep the patch size down. You should align the patch on a cache line so the processor doesn't have fetch 2 cache lines to execute the patch.</p> <p>If you insist on counting, you can analyze the instrn1/2/N to see if they care about the flags that "inc" fools with, and only pushf/popf if needed, or you can insert the increment between two instructions in the patch that don't care. You must be analyzing these to some extent to handle complications such as instn being <strong>ret</strong> anyway; you can generate a better patch (e.g., don't "jmp back").</p> <p>You may find that using <strong>add count,1</strong> is faster than <strong>inc count</strong> because this avoids partial condition code updates and consequent pipeline interlocks. This will affect your cc-impact-analysis a bit, since <strong>inc</strong> doesn't set the carry bit, and <strong>add</strong> does.</p> <p>Another possibility is PC sampling. Don't instrument the code at all; just interrupt the thread periodically and take a sample PC value. If you know where the basic blocks are, a PC sample anywhere in the basic block is evidence the entire block got executed. This won't necessarily give precise coverage data (you may miss critical PC values), but the overhead is pretty low.</p> <p>If you are willing to patch <em>source</em> code, you can do better: just insert "covered[i]=true;" in the beginning the ith basic block, and let the compiler take care of all the various optimizations. No patches needed. The really cool part of this is that if you have basic blocks <em>inside</em> nested loops, and you insert source probes like this, the compiler will notice that the probe assignments are idempotent with respect to the loop and lift the probe out of the loop. Viola, zero probe overhead inside the loop. What more more could you want?</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload