Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>If You do not want exactly the kind of string representation You described, and if You are satisfied with the read/show methods got by <code>deriving</code>, then they make the job. As far as I know, <code>read</code> and <code>show</code> are derived so that they behave the inverse of one another, thus You can use them for saving to file and loading back.</p> <p>If You need to handle other string representation formats, then You have to write a <strong>parser</strong> manually for Your tree datatype. Your work can be helped by strong and industry-strength libraries on this topic.</p> <p>There are much better tutorials on their usage than my words I could say about them. If my intro turns out to be inappropriate, do not lose Your confidence, there are good papers on the topics You can read e.g.</p> <ul> <li><a href="http://fileadmin.cs.lth.se/cs/Education/EDAN40/assignment4/parser.pdf" rel="nofollow">Parsing with Haskell</a> by Lennart Andersson</li> <li><a href="http://www.staff.science.uu.nl/~fokke101/article/parsers/parsers.ps" rel="nofollow">Functional parsers</a> by Jeroen Fokker</li> </ul> <p>or if You want a concrete library help, You can see <a href="http://legacy.cs.uu.nl/daan/parsec.html" rel="nofollow">Parsec</a> or <a href="http://hackage.haskell.org/packages/archive/base/latest/doc/html/Text-ParserCombinators-ReadP.html" rel="nofollow">ReadP</a>.</p> <p>Here I try my summary:</p> <p>A parser is almost something that can convert a string to the value the string represents. The details are a little more complex. A parser not only provides You the value (the intended representation of the string), but it also gives back the "remaining" string.</p> <p>For example, if You have a parser for natural numbers, then let us begin with the following sample string:</p> <p><em>"1 2 3 hello 4 5 6"</em></p> <p>If You parse it with the parser for a natural number, then the parser is expected to provide íou the following two information:</p> <ul> <li>of course it must give You the natural number <strong>1</strong>. I will call that it is the <strong>value</strong> given by the parser. (I typeset here values with bold, and draw them on figures with thick arrows).</li> <li>and also, it must give You also the remaining string <em>" 2 3 hello"</em>. I will call that the parser "leaves" this remaining string from the original string it parsed, or that the parser "consumed" some prefixing part "out of" the "original" string, and "left" a "remaining" string. (I typeset here strings subject to consuming with italic fonts, and draw them on figures with dotted arrows and white arrowheads).</li> </ul> <p>For this remaining string, You can apply the same parser again, which also will leave a (second) remaining string, ad for that, you can apply the parser a third time:</p> <ul> <li>for the second time, the parser provides information that it parsed out the value of number <strong>2</strong> successfully "out of" the string <em>" 2 3 hello"</em> left by the previous parser, and now the remaining string will be <em>" 3 hello 4 5 6"</em>.</li> <li>for the third time, the parser provides information that it parsed out number <strong>3</strong> successfully "out of" the string <em>" 3 hello 4 5 6"</em> left by the previous parser, and now the remaining string will be the remaining string <em>" hello 4 5 6"</em>.</li> </ul> <p>(For simplicity's sake, let us assume that the parser handles whitespaces as intended)</p> <p>The main point is that when we talk about using parsers in sequence, we mean that each time a parser leaves a <em>remaining string</em>, and the next parser will be applied to the remaining string left by the previous parser.</p> <pre><code> string ╭───────╮ remaining string′ ╭───────╮ remaining string″ ┄┄┄┄┄┄┄┄▷ │PARSER1│ ┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄▷ │PARSER2│ ┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄▷ ╰───┰───╯ ╰───┰───╯ ┃ ┃ ┃ ┃ ▼ ▼ value parsed out of value parsed out of original string rem. str′ left by prev parser </code></pre> <p>Here:</p> <pre><code>"1 2 3 hello 4" ╭──────────╮ " 2 3 hello 4" ╭──────────╮ " 3 hello 4" ┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄▷ │NAT-PARSER│ ┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄▶ │NAT-PARSER│ ┄┄┄┄┄┄┄┄┄┄┄┄┄┄▷... ╰────┰─────╯ ╰─────┰────╯ ┃ ┃ ┃ ┃ ▼ ▼ 1 2 </code></pre> <p>But what happens if You try to apply the parser in the following case?</p> <pre><code> " hello 4" ╭──────────╮ ??? ┄┄┄┄┄┄┄┄┄┄┄┄┄┄▷ │NAT-PARSER│ ┄┄┄┄┄┄┄┄┄┄┄▷ ╰─────┰────╯ ┃ ┃ ▼ ??? </code></pre> <p>A parser of natural number cannot parse the string <em>" hello 4"</em>, thus it "fails". Of course there is yet 4 in the string, but it is yet inaccessible to the parser, as it "has not reached there". In such situations, the parser fails.</p> <p>Thus, a parser is something that takes a string, and either provides You with a pair of two pieces of information, or it fails. In case of not failing, the pair of the two pieces of information are:</p> <ul> <li>the value (the intended representation of something in a prefix part of the string)</li> <li>the remaining part of the string.</li> </ul> <p>Thus, for string <em>" hello 4"</em> the parser <code>NAT</code> fails, but for string <em>"1 2 3 hello 4"</em> it succeeds, providing us the information pair <code>(1, " 2 3 hello 4")</code>.</p> <p>We can parse also on string <em>"hello 4"</em>, but 4 is available only after having applied another parser that parses the <em>"hello"</em> part of the string out of the way of the following natural number parser.</p> <pre><code> " hello 4" ╭───────────╮ " 4" ╭──────────╮ "" ┄┄┄┄┄┄┄┄┄┄┄┄┄▷ │WORD-PARSER│ ┄┄┄┄┄┄┄┄▷ │NAT-PARSER│ ┄┄┄┄┄┄┄▷ ╰─────┰─────╯ ╰────┰─────╯ ┃ ┃ ┃ ┃ ▼ ▼ "hello" 4 </code></pre> <p>We see that a parser has a type: a WORD-PARSER is a parser of String, a NAT-PARSER is a parser of Integer (or Nat, if defined explicitly)</p> <p>We have seen also that parsers can fail, and if they do not fail, then they provides two peices of information, both the value and the remainder. We can grasp it by Haskell like this:</p> <pre><code>type Parser a = String -&gt; Maybe (a, String) </code></pre> <p>(For some reasons not mentioned, <code>data Parser a = Prs (String -&gt; Maybe (a, String))</code> is better, but intended the essence is the same.)</p> <p>Thus, a parser of type <code>Parser a</code> is essentially a function that takes a string,</p> <ul> <li>and it yields either <code>Nothing</code> for that string (that means that the parser has "failed" on that string)</li> <li><p>or it yields <code>Just (a, String)</code>, a pair, and we can get everything out of the pair:</p> <ul> <li>the value of type <code>a</code> represented by some prefix part of the string,</li> <li>and the remaining string too. We usually need this too, because we can use this remaining string for continuing the parsing.</li> </ul></li> </ul> <p>We usually want to parse several things from a string, and later we summarize the parsed representations into a single structure.</p> <p>A successful parsing of the tree <code>Node 1 Empty Empty</code> out of string <em>"1 (() ())"</em> can look like that</p> <pre><code> "(1 () ())" ╭──╮ "1 () ())" ╭───╮ "() ())" ╭────╮ "())" ╭────╮ ")" ╭──╮ "" ┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄▷ │OP│ ┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄▷ │LBL│ ┄┄┄┄┄┄┄┄┄┄▷ │TREE│ ┄┄┄┄┄┄┄┄┄▷ │TREE│ ┄┄┄┄┄┄┄▷ │CL│ ┄┄┄┄┄┄┄┄▷ string has been completely consumed ╰┰─╯ ╰─┰─╯ ╰──┰─╯ ╰──┰─╯ ╰┰─╯ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ▼ ▼ ▼ ▼ ▼ (Value not used) 1 Empty Empty (Value not used) </code></pre> <p><code>OP</code> is a parser specifically for opening parenthesis. The value it provides is not important for us here, its more important role is that it takes off the '(' out of the parsed text (more precisely, it yields a remaining string that lacks the leading '(' character from the original string). Thus, it takes away an obstacle from our road, because we can use the <code>LBL</code> "label parser" for the remaining string. It parses the node label out of the string. It also leaves another remaining string, from which we can parse out the two subtrees. (Of course this will mean that we will use recursion, here in this example, let it parse Empty tree). At the and, we have to clean also the trailing <em>")"</em> out of the remained string, for this, we use the <code>CL</code> closing-parenthesis string. This is good, because this way we can parse several concatenated tree string representations, nor "garbage" is left in the string while parsing.</p> <p>Several important things cannot be seen yet from the figures. We have parsed three values: a label (here, <code>1</code>), a subtree (here, <code>Empty</code>), and the other subtree (<code>Empty</code>). How do we get them summarized into the composite single structure <code>Node 1 Empty Empty</code>? How do we really make such parsing chains?</p> <p>The answer is that we do not simply chain isolated parsers, but rather we make each subsequent parsers "dependent" on the provided value of the previous parser:</p> <pre><code> The value which PARAMPARSERₙ would parse ▲ out of remain. string′ ┃ ┃ ┃ string ╭──────╮ remain. string′ ╭─────┸──────╮ remaining string″ ┄┄┄┄┄┄┄┄┄┄▷ │PARSER│ ┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄▷ │PARAMPARSERₙ│ ┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄▷ ╰──┰───╯ ╰───────────▲╯ ┃ ┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ value n </code></pre> <p>In this figure, PARSER is a parser, PARAMPARSER_ is a "parametrized parser". A parametrized parser is not a standalone parser on its own, rather it depends on a (previously parsed) value: it gets the value of a previous parser, and only then becomes a standalone parser.</p> <p>In the figure<code>PARAMPARSER_</code> receives the value <em>n</em> from the previous parser, thus it becomes a parser notated here as <code>PARAMPARSERₙ</code>.</p> <p>For example, imagine the following little "language", where exactly the following strings are considered as valid</p> <ul> <li><em>"0"</em></li> <li><em>"1a"</em></li> <li><em>"2ab"</em></li> <li><em>"3abc"</em></li> <li><em>"4abdx"</em></li> </ul> <p>The language look like this: every sentence begins with a natural number, than it must be followed by exactly that much letter character, that the number indicated. Sentences of the wrong number of letters must be not considered a valid sentence in this simple example-language.</p> <p>Then we would like to use a NAT parser (capable of parsing a natural number out of a string).</p> <p>But how to deal with the requirement that letter characters must be parsed exactly that many times as the number indicates?</p> <p>Let us imagine a parametrized parser, <code>PARSE_AS_MANY_LETTERS_LIKE_THIS_</code>. This"parametrized" parser is dependent on a natural number, i.e. it "takes" a natural number <em>n</em> and "becomes" a parser "parametrized" by this natural number, <code>PARSE_AS_MANY_LETTERS_LIKE_THISₙ</code>, which is meant as "parse exactly <em>n</em> times".</p> <ul> <li>For example <code>PARSE_AS_MANY_LETTERS_LIKE_THIS_</code> parametrized by number 1 is a parser that matches exactly one letter occurrence, and gives back it as the value. Thus <code>PARSE_AS_MANY_LETTERS_LIKE_THIS₁</code> is a parser that when applied on string <em>"abc"</em>, it provides as value the string <em>"a"</em>, and gives a remaining string <em>"bc"</em></li> <li><code>PARSE_AS_MANY_LETTERS_LIKE_THIS_</code> parametrized by 2: is a parser that matches exactly two consecuting letter occurrences. E.g. <code>PARSE_AS_MANY_LETTERS_LIKE_THIS₂</code> is a parser that when applied on string <em>"abc"</em>, it provides value <strong>"ab"</strong>, and leaves a remaining string <em>"c"</em></li> <li><code>PARSE_AS_MANY_XS_LIKE_THIS_</code> parametrized by 3: is a parser that matches exactly three consecuting letter occurrences. Thus <code>PARSE_AS_MANY_XS_LIKE_THIS₃</code> is a parser that when applied on string <em>"abc"</em>, it provides value <strong>"abb"</strong>, and leaves a remaining string <em>""</em></li> </ul> <p>Now we can parse the minilanguage in the example:</p> <pre><code> "ab" ▲ ┃ ┃ "2abxy" ╭───╮ "abxy" ╭────────────┸───────────────────╮ "xy" ┄┄┄┄┄┄┄┄┄┄┄▷ │NAT│ ┄┄┄┄┄┄▷ │PARSE_AS_MANY_LETTERS_LIKE_THIS₂│ ┄┄┄┄┄┄┄┄┄▷ ╰─┰─╯ ╰────────────────────────────── ▲╯ ┃ ┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ value 2 </code></pre> <p>This construction behaves like we expected: Originally, it parsed string <em>"2abxy"</em>, it left as remainder string <em>"xy"</em>, it parsed out exactly a valid sentence of our minilanguage example, and it yielded the "valuable" parts of the sentences (the letters).</p> <p>Expressed in Haskell:</p> <pre><code>nat :: Parser Nat ... parseAsManyLettersLikeThis :: Nat -&gt; Parser String ... </code></pre> <p>and we parse the sting <em>"2abxy"</em> with the "chained" parser <code>nat &gt;&gt;= parseAsManyLettersLikeThis</code>:</p> <pre><code>chained :: Parser String chained = nat &gt;&gt;= parseAsManyLettersLikeThis </code></pre> <p>The compond parser <code>parseAsManyLettersLikeThis</code> parses <em>"2abxy"</em> sucessfully with value <strong>"ab"</strong>, leaving the last two characters in the string. It is essentally a function that when taking <em>"2abxy"</em>, it returns <code>Just ("ab", "xy")</code>. It utilizes the first constituent parser to control the next part of the parsing process.</p> <p>Haskell provides us some syntactic sugar to express all this in a more readable way:</p> <pre><code>nat :: Parser Nat ... parseAsManyLettersLikeThis :: Nat -&gt; Parser String ... chained :: Parser String chained = do n &lt;- nat letters &lt;- parseAsManyLettersLikeThis n return letters </code></pre> <p>This notation is more easy to read, and even more resembling to a familiar imperative programming language. Still, it can be seen that <code>parseAsManyLettersLikeThis</code> is <strong>indeed</strong> dependent on the first parser <code>nat</code>, as <code>parseAsManyLettersLikeThis</code> uses the value <em>n</em> provided previously by <code>nat</code>.</p> <p>Another notation of Haskell that shows this dependence more explicitly:</p> <pre><code>nat :: Parser Nat ... parseAsManyLettersLikeThis :: Nat -&gt; Parser String ... chained :: Parser String chained = nat &gt;&gt;= \n -&gt; parseAsManyLettersLikeThis n </code></pre> <p>Now we see that there exists a "natural" chaining operation between passers, i.e we can chain a parser and a parametrized parser in a natural way (this operation is notated by <code>&gt;&gt;=</code> in Haskell, and its meaning is shown by the figure).</p> <p>There are also other operations between parsers. Haskell has nice libraries on parsing, and also the language itself has notations that allow us to use parser in an intuitive and natural way. See links on parsing in Haskell, the library I best know is <a href="http://legacy.cs.uu.nl/daan/parsec.html" rel="nofollow">Parsec</a> (mentioned in the beginning of my answer). There is also the <a href="http://hackage.haskell.org/packages/archive/base/latest/doc/html/Text-ParserCombinators-ReadP.html" rel="nofollow">ReadP</a> parsing library in Haskell Hierarchical Libraries.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload