Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Well there are two parts to your question. The first part is parsing the expression </p> <pre><code>(+ 1 2 (+ 3 4)) </code></pre> <p>The second part is transforming the output to the result that you want. To get a good understanding of these principles, I highly recommend Udacity's <a href="https://www.udacity.com/course/cs262">Programming Languages course.</a> Carin Meier's <a href="http://gigasquidsoftware.com/blog/2013/05/01/growing-a-language-with-clojure-and-instaparse/">blog post</a> is also quite helpful. </p> <p>The best way to understand how the parser will work, is to break it down into smaller parts. So in the first we'll just examine some parsing rules, and in the second part we'll build our sexps.</p> <ol> <li><p><strong>A simple example</strong> </p> <p>You will first need to write a grammar that tells instaparse how to parse the given expression. We'll start by just parsing the number <code>1</code>:</p> <pre><code>(def parser (insta/parser "sexp = number number = #'[0-9]+' ")) </code></pre> <p>sexp describes the highest level grammar for the sexpression. Our grammar states that the sexp can only have a number. The next line states that the number can be any digit 0-9, and the <code>+</code> is similar to the regex <code>+</code> which means that it must have one number repeated any number of times. If we run our parser we get the following parse tree:</p> <pre><code>(parser "1") =&gt; [:sexp [:number "1"]] </code></pre> <p><em>Ingoring Parenthesis</em></p> <p>We can ignore certain values by adding angled brackets <code>&lt;</code> to our grammar. So if we want to parse <code>"(1)"</code> as simply <code>1</code> we can right our grammar as:</p> <pre><code>(def parser (insta/parser "sexp = lparen number rparen &lt;lparen&gt; = &lt;'('&gt; &lt;rparen&gt; = &lt;')'&gt; number = #'[0-9]+' ")) </code></pre> <p>and if we run the parser again, it will ignore the left and right parenthesis:</p> <pre><code>(parser "(1)") =&gt; [:sexp [:number "1"]] </code></pre> <p>This will become helpful when we write the grammar for sexp below. </p> <p><em>Adding Spaces</em> </p> <p>Now happens if we add spaces and run <code>(parser "( 1 )")</code>? Well we get an error: </p> <pre><code>(parser "( 1 )") =&gt; Parse error at line 1, column 2: ( 1 ) ^ Expected: #"[0-9]+" </code></pre> <p>That's because we haven't defined the concept of space in our grammar! So we can add spaces as such:</p> <pre><code>(def parser (insta/parser "sexp = lparen space number space rparen &lt;lparen&gt; = &lt;'('&gt; &lt;rparen&gt; = &lt;')'&gt; number = #'[0-9]+' &lt;space&gt; = &lt;#'[ ]*'&gt; ")) </code></pre> <p>Again the <code>*</code> is similar to the regex <code>*</code> and it means zero or more than one occurrence of a space. That means the following examples will all return the same result:</p> <pre><code>(parser "(1)") =&gt; [:sexp [:number "1"]] (parser "( 1 )") =&gt; [:sexp [:number "1"]] (parser "( 1 )") =&gt; [:sexp [:number "1"]] </code></pre></li> <li><p><strong>Building the Sexp</strong></p> <p>We're slowly going to build our grammar from the ground up. It might be useful to look at the final product <a href="https://gist.github.com/pooster/6221672">here</a>, just to give an overview of where we're headed. </p> <p>So, an sexp contains more than just numbers as defined by our simple grammar. One high level view we can have of sexp is to view them as an operation between two parenthesis. So basically as a <code>( operation )</code>. We can write this directly into our grammar.</p> <pre><code>(def parser (insta/parser "sexp = lparen operation rparen &lt;lparen&gt; = &lt;'('&gt; &lt;rparen&gt; = &lt;')'&gt; operation = ??? ")) </code></pre> <p>As I stated above, the angled brackets <code>&lt;</code> tell instaparse to ignore these values when it is making the parse tree. Now what is an operation? Well an operation consists of an operator, like <code>+</code>, and some arguments, like the numbers <code>1</code> and <code>2</code>. So we can say write our grammar as:</p> <pre><code>(def parser (insta/parser "sexp = lparen operation rparen &lt;lparen&gt; = &lt;'('&gt; &lt;rparen&gt; = &lt;')'&gt; operation = operator + args operator = '+' args = number number = #'[0-9]+' ")) </code></pre> <p>We stated only one possible operator, <code>+</code>, just to keep things simple. We have also included the number grammar rule from the simple example above. Our grammar, however, is very limited. The only valid sexp it can parse is <code>(+1)</code>. That's because we haven't included the concept of spaces, and have stated that args can have only one number. So in this step we will do two things. We will add spaces, and we will state that args can have more than one number. </p> <pre><code>(def parser (insta/parser "sexp = lparen operation rparen &lt;lparen&gt; = &lt;'('&gt; &lt;rparen&gt; = &lt;')'&gt; operation = operator + args operator = '+' args = snumber+ &lt;snumber&gt; = space number &lt;space&gt; = &lt;#'[ ]*'&gt; number = #'[0-9]+' ")) </code></pre> <p>We added <code>space</code> by using the space grammar rule we defined in the simple example. We created a new <code>snumber</code> which is defined as <code>space</code> and a <code>number</code>, and added the <code>+</code> to snumber to state that it must appear once but it can repeat any number of times. So we can run our parser as so:</p> <pre><code>(parser "(+ 1 2)") =&gt; [:sexp [:operation [:operator "+"] [:args [:number "1"] [:number "2"]]]] </code></pre> <p>We can make our grammar more robust by having <code>args</code> reference back to <code>sexp</code>. That way we can have sexp in our sexp! We can do this by creating <code>ssexp</code> which adds a <code>space</code> to <code>sexp</code> and then add <code>ssexp</code> to <code>args</code>.</p> <pre><code>(def parser (insta/parser "sexp = lparen operation rparen &lt;lparen&gt; = &lt;'('&gt; &lt;rparen&gt; = &lt;')'&gt; operation = operator + args operator = '+' args = snumber+ ssexp* &lt;ssexp&gt; = space sexp &lt;snumber&gt; = space number &lt;space&gt; = &lt;#'[ ]*'&gt; number = #'[0-9]+' ")) </code></pre> <p>Now we can run </p> <pre><code>(parser "(+ 1 2 (+ 1 2))") =&gt; [:sexp [:operation [:operator "+"] [:args [:number "1"] [:number "2"] [:sexp [:operation [:operator "+"] [:args [:number "1"] [:number "2"]]]]]]] </code></pre></li> <li><p><strong>Transformations</strong></p> <p>This step can be done using any number of tools that work on trees, such enlive, zippers, match, and tree-seq. Instaparse, however, also includes its own useful function called <code>insta\transform</code>. We can build our transformations by replacing the keys in our parse tree by the valid clojure functions. For example, <code>:number</code> becomes <code>read-string</code> to turn our strings into valid numbers, <code>:args</code> becomes <code>vector</code> to build our arguments. </p> <p>So, we want to transform this:</p> <pre><code> [:sexp [:operation [:operator "+"] [:args [:number "1"] [:number "2"]]]] </code></pre> <p>Into this:</p> <pre><code> (identity (apply + (vector (read-string "1") (read-string "2")))) =&gt; 3 </code></pre> <p>We can do that by defining our transformation options:</p> <pre><code>(defn choose-op [op] (case op "+" +)) (def transform-options {:number read-string :args vector :operator choose-op :operation apply :sexp identity }) </code></pre> <p>The only tricky thing here was adding the function <code>choose-op</code>. What we want, is to pass the function <code>+</code> to <code>apply</code>, but if we replace <code>operator</code> with <code>+</code> it will use <code>+</code> as a regular function. So it will transform our tree to this:</p> <pre><code> ... (apply (+ (vector ... </code></pre> <p>But by using <code>choose-op</code> it will pass <code>+</code> as an argument to <code>apply</code> as such:</p> <pre><code> ... (apply + (vector ... </code></pre></li> </ol> <p><strong>Conclusion</strong></p> <p>We can now run our little interpreter by putting the parser and transformer together:</p> <pre><code>(defn lisp [input] (-&gt;&gt; (parser input) (insta/transform transform-options))) (lisp "(+ 1 2)") =&gt; 3 (lisp "(+ 1 2(+ 3 4))") =&gt; 10 </code></pre> <p>You can find the final code used in this tutorial <a href="https://gist.github.com/pooster/6221672">here</a>. </p> <p>Hopefully, this short introduction is enough to get going on your own projects. You can new lines by declaring a grammar for <code>\n</code> and you can even choose to not ignore spaces in your parse tree by removing the angled brackets <code>&lt;</code>. That might be helpful given that you're trying to keep the indentation. Hope this helps, If not just write a comment! </p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload