Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>My answer below is in reference to the Flex-Bison (or Lex-Yacc) model of compilation only. I have little knowledge of other models.</p> <p>I think of the lexer / parser combination as two co-operating modules in the same program. When you use Flex with Bison you see that the latter calls a function <code>yylex()</code> provided by the former (<code>yylex()</code> is equivalent to the <code>getNextToken()</code> function in your question). So it makes more sense to think of them as co-operating units in a single program rather than two different programs. Moreover, if the lexer and parser <em>were</em> 2 different programs, you'd have to deal with inter-process communication, shared memory, and related issues, further complicating the task at hand.</p> <p>To answer your second question: I can think of one important issue that could arise from the parser coming into action <em>after</em> the lexer had finished reading all input: memory usage would be enormous for even moderate-sized programs, as you would have to store data structures for <em>every token</em>, in memory (think of tokens like <code>,</code> and <code>=</code> occupying multiple bytes in memory and you'll quickly see why it's not scalable).</p> <p>As for error handling: if the lexer cannot match the input to any regular expression, then <code>yylex()</code> should return a -1 to the parser, using a flex rule as follows:</p> <pre><code>. { return -1; } </code></pre> <p>(Note the near-invisible period in the first column, which matches any input symbol except <code>\n</code>)</p> <p>(<strong>NOTE</strong>: This rule should be the <em>last</em> rule to appear in your flex file, because the order of the rules determines priority: a token is matched by Flex using the <em>first</em> possible rule in the flex file.)</p> <p>A return value of -1 by the lexer indicates a <em>tokenization error</em>, and the Bison parser handles it automatically by calling <code>yyerror(char *)</code> (ideally defined by you); otherwise, if the input encounters a <em>parsing error</em>, the parser, again, calls <code>yyerror(char *)</code>.</p> <p>Also, if you want to display the erroneous piece of code when an error is encountered, you'd have to have a way to access the related source code given the defective token, which means the approach of reading the input entirely followed by parsing would not work at all, unless you store associated source code with <em>each</em> token while tokenizing, essentially making a memory behemoth of a compiler.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload