Note that there are some explanatory texts on larger screens.

plurals
  1. POPractical consequences of formal grammar power?
    primarykey
    data
    text
    <p>Every undergraduate Intro to Compilers course reviews the commonly-implemented subsets of context-free grammars: LL(k), SLR(k), LALR(k), LR(k). We are also taught that for any given k, each of those grammars is a subset of the next.</p> <p>What I've never seen is an explanation of what sorts of programming language syntactic features might require moving to a different language class. There's an obvious practical motivation for GLR parsers, namely, avoiding an unholy commingling of parser and symbol table when parsing C++. But what about the differences between the two "standard" classes, LL and LR? </p> <p>Two questions:</p> <ol> <li><b>What (general) syntactic constructions can be parsed with LR(k) but not LL(k')?</b></li> <li><b>In what ways, if any, do those constructions manifest as desirable language constructs?</b></li> </ol> <p>There's a plausible argument for reducing language power by making k as small as possible, because a language requiring many, many tokens of lookahead will be harder for humans to parse, as well as "harder" for machines to parse. Question (2) implicitly asks if the same reasoning ends up holding between classes, as well as within a class.</p> <hr> <p>edit: Here's one example to illustrate the sorts of answers I'm looking for, but for regular languages instead of context-free:</p> <p>When describing a regular language, one usually gets three operators: <code>+</code>, <code>*</code>, and <code>?</code>. Now, you can remove <code>+</code> without reducing the power of the language; instead of writing <code>x+</code>, you write <code>xx*</code>, and the effect is the same. But if <code>x</code> is some big and hairy expression, the two <code>x</code>s are likely to diverge over time due to human forgetfulness, yielding a syntactically correct regular expression that doesn't match the original author's intent. Thus, even though adding <code>+</code> doesn't strictly add power, it does make the notation less error-prone.</p> <p>Are there constructs with similar practical (human?) effects that must be "removed" when switching from LR to LL?</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. COIn case you're casting about for examples, there are plenty of language features in widely-used languages that are neither LL nor LR. **Indentation** in Python isn't context-free. (The high-level grammar is LL(1), but the lexer can't be!) **C/C++ declarations** such as `x * y;` require the parser to know if *x* is a typename (if it's not, that's multiplication), which makes the language context-sensitive. **Fixity declarations** in Haskell let you declare operator precedence on the fly. And only perl can parse Perl.
      singulars
    2. CODescending into trivia: In ECMAScript, `/` followed by a space usually means division. But it might be the start of a regular expression, depending on the preceding code. And semicolons are automatically inserted where needed. Both are a pain, but bizarrely enough I don't think they make the language context-sensitive. I might be forgetting something, but I think ECMAScript is LR. (That is, you could write an LR grammar, if you were willing to spell out all the places where a newline is ignored. Such a grammar would of course be rather ridiculous-looking.)
      singulars
    3. COHowdy Jason! I had read that Python's approach is a custom lexer that manually inserts IDENT/DEDENT tokens, which feeds into a generated parser. Seems like a pretty sane approach, all things considered. Dynamic operator precedence is a separate issue than LL/LR, I think? It requires operator precedence parsing of expressions, which can be accomplished within any type of parser. The regex example from ES is interesting! I hadn't thought of it before.
      singulars
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload