Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>I've written a number of assemblers over the years doing hand parsing and frankly, you're probably better off using a grammar language and a parser generator.</p> <p>Here's why - a typical assembly line will probably look something like this:</p> <pre><code>[label:] [instruction|directive][newline] </code></pre> <p>and an instruction will be:</p> <pre><code>plain-mnemonic|mnemonic-withargs </code></pre> <p>and a directive will be:</p> <pre><code>plain-directive|directive-withargs </code></pre> <p>etc.</p> <p>With a decent parser generator like <a href="http://www.devincook.com/goldparser/" rel="noreferrer">Gold</a>, you should be able to knock out a grammar for 8051 in a few hours. The advantage to this over hand parsing is that you will be able to have complicated enough expressions in your assembly code like:</p> <pre><code>.define kMagicNumber 0xdeadbeef CMPA #(2 * kMagicNumber + 1) </code></pre> <p>which can be a real bear to do by hand.</p> <p>If you want to do it by hand, make a table of all your mnemonics which will also include the various allowable addressing modes that they support and for each addressing mode, the number of bytes that each variant will take and the opcode for it. Something like this:</p> <pre><code>enum { Implied = 1, Direct = 2, Extended = 4, Indexed = 8 // etc } AddressingMode; /* for a 4 char mnemonic, this struct will be 5 bytes. A typical small processor * has on the order of 100 instructions, making this table come in at ~500 bytes when all * is said and done. * The time to binary search that will be, worst case 8 compares on the mnemonic. * I claim that I/O will take way more time than look up. * You will also need a table and/or a routine that given a mnemonic and addressing mode * will give you the actual opcode. */ struct InstructionInfo { char Mnemonic[4]; char AddessingMode; } /* order them by mnemonic */ static InstructionInfo instrs[] = { { {'A', 'D', 'D', '\0'}, Direct|Extended|Indexed }, { {'A', 'D', 'D', 'A'}, Direct|Extended|Indexed }, { {'S', 'U', 'B', '\0'}, Direct|Extended|Indexed }, { {'S', 'U', 'B', 'A'}, Direct|Extended|Indexed } }; /* etc */ static int nInstrs = sizeof(instrs)/sizeof(InstrcutionInfo); InstructionInfo *GetInstruction(char *mnemonic) { /* binary search for mnemonic */ } int InstructionSize(AddressingMode mode) { switch (mode) { case Inplied: return 1; / * etc */ } } </code></pre> <p>Then you will have a list of every instruction which in turn contains a list of all the addressing modes.</p> <p>So your parser becomes something like this:</p> <pre><code>char *line = ReadLine(); int nextStart = 0; int labelLen; char *label = GetLabel(line, &amp;labelLen, nextStart, &amp;nextStart); // may be empty int mnemonicLen; char *mnemonic = GetMnemonic(line, &amp;mnemonicLen, nextStart, &amp;nextStart); // may be empty if (IsOpcode(mnemonic, mnemonicLen)) { AddressingModeInfo info = GetAddressingModeInfo(line, nextStart, &amp;nextStart); if (IsValidInstruction(mnemonic, info)) { GenerateCode(mnemonic, info); } else throw new BadInstructionException(mnemonic, info); } else if (IsDirective()) { /* etc. */ } </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload