StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>This is because all the rules expose a "raw" <code>ast::expression</code> and this is a fixed type.</p> <p>It has clearly been a choice for simplicity in this sample: the benefits are</p> <ul> <li>By making all expression nodes the same type, the tree visitation code can be simple and uniform.</li> <li>All rules have the same 'signature' and follow the same pattern. It's easy to reason about.</li> <li>In this particular example (mini_c) the code-gen phase benefits by inheriting the same simplicity.</li> </ul> <p>The usual way to have more flexible AST that <strong>follows the <em>semantics</em></strong> more closely would be by making <code>expression</code> a variant instead: that way each expression can directly contain the actual "concrete" subexpression type instead of "wading" through intermediate levels of node types, just to <strong>mimic the <em>grammar structure</em> instead of the <em>semantics</em></strong>.</p> <blockquote> <p>I think I have several examples of such asts and corresponding grammars on this site, will see if I can link one later. </p> <p><sup><em>In fact, I think the <code>typedef variant<...> statement</code> in the same example (<code>ast.hpp</code>) is not a bad example of this approach.</em></sup></p> <p>Relevant links:</p> <ul> <li><a href="https://stackoverflow.com/questions/8464969/boostspirit-expression-parser/8468822#8468822">Boost::Spirit Expression Parser</a></li> <li><p><a href="https://stackoverflow.com/questions/17012941/boostspirit-how-to-parse-and-call-c-function-like-expressions/17014063#17014063">Boost::spirit how to parse and call c++ function-like expressions</a> this being a question which I answered by providing a fully featured expression parser (with 'builtin function' evaluation), in two ways:</p> <ul> <li><a href="https://stackoverflow.com/a/17013713/85371">A 'bells and whistles' approach with a full AST</a> (resembling what we're doing here with the mini_c sample expressions)</li> <li><a href="https://stackoverflow.com/a/17014063/85371">A 'pragmatic' approach which evaluates on-the-fly using just <em>semantic actions</em></a> - note this is optimal for storage, but has some downsides, which I address in that post</li> </ul></li> </ul> </blockquote> <p>For now, if you don't wish to alter the grammar (so as not to "lose" the simplicity) you could instead do a transformation on the AST (a "simplify" pass on the expressions, so to speak).</p> <p>I'm going to see what I can come up with in the next hour.</p> <p>I've just refactored the grammar so that it doesn't result in such deep nesting. </p> <ol> <li><p>First, let's reduce the test to a standalone test bed that just parses expressions and shows how a simple expression (<code>"42"</code>) parses to a deeply nested AST: <a href="http://coliru.stacked-crooked.com/a/5467ca41b0ac1d03" rel="nofollow noreferrer">http://coliru.stacked-crooked.com/a/5467ca41b0ac1d03</a> </p> <pre><code><expr> ... <success></success> <attributes>[[[[[[[42, []], []], []], []], []], []]]</attributes> </expr> </code></pre></li> <li><p>Next, let's remove the root problem: the grammar returns an invariant type (<code>ast::expression</code>) which is too heavy in many cases. Instead, we'd like to return an <code>ast::operand</code> (which is variant, and <em>can</em> contain the same <code>ast::expression</code> node): <a href="http://coliru.stacked-crooked.com/a/00e43b1f61db018c" rel="nofollow noreferrer">http://coliru.stacked-crooked.com/a/00e43b1f61db018c</a></p></li> <li><p>Lastly we'd like all rules to become <em>variant</em> as well and returning <em>either</em> an <code>expression</code> iff there are trailing operations, or <em>just</em> a lone <code>operand</code> in the other case, in pseudo code:</p> <pre><code>logical_or_expr = (logical_and_expr >> +(logical_or_op > logical_and_expr) | logical_and_expr ; </code></pre> <p><sub><strong>Note</strong> the subtle use if <code>+(...)</code> instead of <code>*(...)</code> to mandate at least one trailing <em>logical_or operation</em></sub></p> <p>Now, Spirit will have to be told how to assign the first branch to an <code>operand</code> attribute. <code>qi::attr_cast<ast::expression>(...)</code> should have been the fix here, but sadly I ran into undefined behaviour (this took the most time). I settled for a more verbose solution:</p> <pre><code>_logical_or_expr = logical_and_expr >> +(logical_or_op > logical_and_expr) ; logical_or_expr = _logical_or_expr | logical_and_expr; </code></pre> <p>As you can probably see, it's just the same, but with the first branch extracted to a separate rule, so we can just declare the rules to expose the desired attributes:</p> <pre><code>qi::rule<It, ast::operand(), Skipper> logical_or_expr; qi::rule<It, ast::expression(), Skipper> _logical_or_expr; </code></pre> <p>Indeed doing this for each precedence level of subexpressions, results in the following:</p> <pre><code>_logical_or_expr = logical_and_expr >> +(logical_or_op > logical_and_expr) ; _multiplicative_expr = unary_expr >> *(multiplicative_op > unary_expr) ; _additive_expr = multiplicative_expr >> +(additive_op > multiplicative_expr) ; _relational_expr = additive_expr >> +(relational_op > additive_expr) ; _equality_expr = relational_expr >> +(equality_op > relational_expr) ; _logical_and_expr = equality_expr >> +(logical_and_op > equality_expr) ; logical_or_expr = _logical_or_expr | logical_and_expr ; logical_and_expr = _logical_and_expr | equality_expr ; equality_expr = _equality_expr | relational_expr ; relational_expr = _relational_expr | additive_expr ; additive_expr = _additive_expr | multiplicative_expr ; multiplicative_expr = _multiplicative_expr | attr_cast<ast::operand, ast::operand> (unary_expr) ; </code></pre></li> </ol> <p>The end result is here: <a href="http://coliru.stacked-crooked.com/a/8539757bb02fca34" rel="nofollow noreferrer">http://coliru.stacked-crooked.com/a/8539757bb02fca34</a> (sadly it is just too much for Coliru), and it prints:</p> <pre><code><expr> ... </logical_or_expr> <success></success> <attributes>[[42, []]]</attributes> </expr> </code></pre> <p><strong>CAVEAT</strong> Note that this adaptation will <strong><em>NOT</em></strong> make the parser more efficient! In fact, it will just result in a boatload of backtracking (the debug output counts 925 lines instead of just 45 in <strong>Step 1</strong> (!!)).</p> <p>Now, there will be some room to optimize using look-ahead assertions and/or semantic actions, but in general you will have to consider that optimizing for AST storage is going to cost CPU time.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload