Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>There are several different parser traits and base classes for different purposes.</p> <p>The main trait is <code>scala.util.parsing.combinator.Parsers</code>. This has most of the main combinators like <code>opt</code>, <code>rep</code>, <code>elem</code>, <code>accept</code>, etc. Definitely look over the documentation for this one, since this is most of what you need to know. The actual <code>Parser</code> class is defined as an inner class here, and that's important to know about, too.</p> <p>Another important trait is <code>scala.util.parsing.combinator.lexical.Scanners</code>. This is the base trait for parsers which read a stream of characters and produce a stream of tokens (also known as lexers). In order to implement this trait, you need to implement a <code>whitespace</code> parser, which reads whitespace characters, comments, etc. You also need to implement a <code>token</code> method, which reads the next token. Tokens can be whatever you want, but they must be a subclass of <code>Scanners.Token</code>. <code>Lexical</code> extends <code>Scanners</code> and <code>StdLexical</code> extends <code>Lexical</code>. The former provides some useful basic operations (like <code>digit</code>, <code>letter</code>), while the latter actually defines and lexes common tokens (like numeric literals, identifiers, strings, reserved words). You just have to define <code>delimiters</code> and <code>reserved</code>, and you will get something useful for most languages. The token definitions are in <code>scala.util.parsing.combinator.token.StdTokens</code>.</p> <p>Once you have a lexer, you can define a parser which reads a stream of tokens (produced by the lexer) and generates an abstract syntax tree. Separating the lexer and parser is a good idea since you won't need to worry about whitespace or comments or other complications in your syntax. If you use <code>StdLexical</code>, you may consider using <code>scala.util.parsing.combinator.syntax.StdTokenPasers</code> which has parsers built in to translate tokens into values (e.g., <code>StringLit</code> into <code>String</code>). I'm not sure what the difference is with <code>StandardTokenParsers</code>. If you define your own token classes, you should just use <code>Parsers</code> for simplicity.</p> <p>You specifically asked about <code>RegexParsers</code> and <code>JavaTokenParsers</code>. <code>RegexParsers</code> is a trait which extends <code>Parsers</code> with one additional combinator: <code>regex</code>, which does exactly what you would expect. Mix in <code>RegexParsers</code> to your lexer if you want to use regular expressions to match tokens. <code>JavaTokenParsers</code> provides some parsers which lex tokens from Java syntax (like identifiers, integers) but without the token baggage of <code>Lexical</code> or <code>StdLexical</code>. </p> <p>To summarise, you probably want two parsers: one which reads characters and produces tokens, and one which takes tokens and produces an AST. Use something based on <code>Lexical</code> or <code>StdLexical</code> for the first. Use something based on <code>Parsers</code> or <code>StdTokenParsers</code> for the second depending on whether you use <code>StdLexical</code>.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload