Note that there are some explanatory texts on larger screens.

plurals
  1. POHow is a relative JMP (x86) implemented in an Assembler?
    text
    copied!<p>While building my assembler for the x86 platform I encountered some problems with encoding the <code>JMP</code> instruction:</p> <pre><code>OPCODE INSTRUCTION SIZE EB cb JMP rel8 2 E9 cw JMP rel16 4 (because of 0x66 16-bit prefix) E9 cd JMP rel32 5 ... </code></pre> <p>(<em>from my favourite x86 instruction website, <a href="http://siyobik.info/index.php?module=x86&amp;id=147" rel="noreferrer">http://siyobik.info/index.php?module=x86&amp;id=147</a></em>)</p> <p>All are <strong>relative</strong> jumps, where the size of each encoding (operation + operand) is in the third column.</p> <p>Now my original (and thus fault because of this) design reserved the maximum (5 bytes) space for each instruction. The operand is not yet known, because it's a jump to a yet unknown location. So I've implemented a "rewrite" mechanism, that rewrites the operands in the correct location in memory, if the location of the jump is known, and fills the rest with <code>NOP</code>s. This is a somewhat serious concern in tight-loops.</p> <p>Now my problem is with the following situation:</p> <pre><code>b: XXX c: JMP a e: XXX ... XXX d: JMP b a: XXX (where XXX is any instruction, depending on the to-be assembled program) </code></pre> <p>The problem is that I want the smallest possible encoding for a <code>JMP</code> instruction (and no <code>NOP</code> filling).</p> <p>I have to know the size of the instruction at <code>c</code> before I can calculate the relative distance between <code>a</code> and <code>b</code> for the operand at <code>d</code>. The same applies for the <code>JMP</code> at <code>c</code>: it needs to know the size of <code>d</code> before it can calculate the relative distance between <code>e</code> and <code>a</code>.</p> <p><strong>How do existing assemblers solve this problem, or how would you do this?</strong></p> <p>This is what I am thinking which solves the problem:</p> <blockquote> <p>First encode all the instructions to opcodes between the <code>JMP</code> and it's target, if this region contains a variable-sized opcode, use the maximum size, e.g. <code>5</code> for a <code>JMP</code>. Then encode the relative <code>JMP</code> to it's target, by choosing the smallest possible encoding size (3, 4 or 5) and calculate the distance. If any variable-sized opcode is encoded, change all absolute operands before, and all relative instructions that skips over this encoded instruction: they are re-encoded when their operand changes to choose the smallest possible size. This method is guaranteed to end, as variable-sized opcodes only may shrink (because it uses the maximum size of them).</p> </blockquote> <p>I wonder, <em>perhaps this is an over-engineered solution</em>, that's why I ask this question.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload