Note that there are some explanatory texts on larger screens.

plurals
  1. POThreading extra state through a parser in Scala
    primarykey
    data
    text
    <h2>I'll give you the tl;dr up front</h2> <p>I'm trying to use the state monad transformer in <a href="https://github.com/scalaz/scalaz" rel="nofollow noreferrer">Scalaz 7</a> to thread extra state through a parser, and I'm having trouble doing anything useful without writing a <em>lot</em> of <code>t m a -&gt; t m b</code> versions of <code>m a -&gt; m b</code> methods.</p> <h2>An example parsing problem</h2> <p>Suppose I have a string containing nested parentheses with digits inside them:</p> <pre><code>val input = "((617)((0)(32)))" </code></pre> <p>I also have a stream of fresh variable names (characters, in this case):</p> <pre><code>val names = Stream('a' to 'z': _*) </code></pre> <p>I want to pull a name off the top of the stream and assign it to each parenthetical expression as I parse it, and then map that name to a string representing the contents of the parentheses, with the nested parenthetical expressions (if any) replaced by their names.</p> <p>To make this more concrete, here's what I'd want the output to look like for the example input above:</p> <pre><code>val target = Map( 'a' -&gt; "617", 'b' -&gt; "0", 'c' -&gt; "32", 'd' -&gt; "bc", 'e' -&gt; "ad" ) </code></pre> <p>There may be either a string of digits or arbitrarily many sub-expressions at a given level, but these two kinds of content won't be mixed in a single parenthetical expression. </p> <p>To keep things simple, we'll assume that the stream of names will never contain either duplicates or digits, and that it will always contain enough names for our input.</p> <h2>Using parser combinators with a bit of mutable state</h2> <p>The example above is a slightly simplified version of the parsing problem in <a href="https://stackoverflow.com/q/12442615/334519">this Stack Overflow question</a>. I <a href="https://stackoverflow.com/a/12443270/334519">answered that question</a> with a solution that looked roughly like this:</p> <pre><code>import scala.util.parsing.combinator._ class ParenParser(names: Iterator[Char]) extends RegexParsers { def paren: Parser[List[(Char, String)]] = "(" ~&gt; contents &lt;~ ")" ^^ { case (s, m) =&gt; (names.next -&gt; s) :: m } def contents: Parser[(String, List[(Char, String)])] = "\\d+".r ^^ (_ -&gt; Nil) | rep1(paren) ^^ ( ps =&gt; ps.map(_.head._1).mkString -&gt; ps.flatten ) def parse(s: String) = parseAll(paren, s).map(_.toMap) } </code></pre> <p>It's not too bad, but I'd prefer to avoid the mutable state. </p> <h2>What I want</h2> <p>Haskell's <a href="http://hackage.haskell.org/package/parsec-3.1.3" rel="nofollow noreferrer">Parsec</a> library makes adding user state to a parser trivially easy:</p> <pre><code>import Control.Applicative ((*&gt;), (&lt;$&gt;), (&lt;*)) import Data.Map (fromList) import Text.Parsec paren = do (s, m) &lt;- char '(' *&gt; contents &lt;* char ')' h : t &lt;- getState putState t return $ (h, s) : m where contents = flip (,) [] &lt;$&gt; many1 digit &lt;|&gt; (\ps -&gt; (map (fst . head) ps, concat ps)) &lt;$&gt; many1 paren main = print $ runParser (fromList &lt;$&gt; paren) ['a'..'z'] "example" "((617)((0)(32)))" </code></pre> <p>This is a fairly straightforward translation of my Scala parser above, but without mutable state.</p> <h2>What I've tried</h2> <p>I'm trying to get as close to the Parsec solution as I can using Scalaz's state monad transformer, so instead of <code>Parser[A]</code> I'm working with <code>StateT[Parser, Stream[Char], A]</code>. I have a "solution" that allows me to write the following:</p> <pre><code>import scala.util.parsing.combinator._ import scalaz._, Scalaz._ object ParenParser extends ExtraStateParsers[Stream[Char]] with RegexParsers { protected implicit def monadInstance = parserMonad(this) def paren: ESP[List[(Char, String)]] = (lift("(" ) ~&gt; contents &lt;~ lift(")")).flatMap { case (s, m) =&gt; get.flatMap( names =&gt; put(names.tail).map(_ =&gt; (names.head -&gt; s) :: m) ) } def contents: ESP[(String, List[(Char, String)])] = lift("\\d+".r ^^ (_ -&gt; Nil)) | rep1(paren).map( ps =&gt; ps.map(_.head._1).mkString -&gt; ps.flatten ) def parse(s: String, names: Stream[Char]) = parseAll(paren.eval(names), s).map(_.toMap) } </code></pre> <p>This works, and it's not that much less concise than either the mutable state version or the Parsec version.</p> <p>But my <code>ExtraStateParsers</code> is ugly as sin—I don't want to try your patience more than I already have, so I won't include it here (although <a href="https://gist.github.com/3747234" rel="nofollow noreferrer">here's a link</a>, if you really want it). I've had to write new versions of every <code>Parser</code> and <code>Parsers</code> method I use above for my <code>ExtraStateParsers</code> and <code>ESP</code> types (<code>rep1</code>, <code>~&gt;</code>, <code>&lt;~</code>, and <code>|</code>, in case you're counting). If I had needed to use other combinators, I'd have had to write new state transformer-level versions of them as well.</p> <p>Is there a cleaner way to do this? I'd love to see an example of a Scalaz 7's state monad transformer being used to thread state through a parser, but Scalaz 6 or Haskell examples would also be useful and appreciated.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload