Note that there are some explanatory texts on larger screens.

plurals
  1. POHaskell Parsec - error messages are less helpful while using custom tokens
    primarykey
    data
    text
    <p>I'm working on seperating lexing and parsing stages of a parser. After some tests, I realized error messages are less helpful when I'm using some tokens other than Parsec's Char tokens.</p> <p>Here are some examples of Parsec's error messages while using Char tokens:</p> <pre><code>ghci&gt; P.parseTest (string "asdf" &gt;&gt; spaces &gt;&gt; string "ok") "asdf wrong" parse error at (line 1, column 7): unexpected "w" expecting space or "ok" ghci&gt; P.parseTest (choice [string "ok", string "nop"]) "wrong" parse error at (line 1, column 1): unexpected "w" expecting "ok" or "nop" </code></pre> <p>So, string parser shows what string is expected when found an unexpected string, and choice parser shows what are alternatives.</p> <p>But when I use same combinators with my tokens:</p> <pre><code>ghci&gt; Parser.parseTest ((tok $ Ide "asdf") &gt;&gt; (tok $ Ide "ok")) "asdf " parse error at "test" (line 1, column 1): unexpected end of input </code></pre> <p>In this case, it doesn't print what was expected.</p> <pre><code>ghci&gt; Parser.parseTest (choice [tok $ Ide "ok", tok $ Ide "nop"]) "asdf " parse error at (line 1, column 1): unexpected (Ide "asdf","test" (line 1, column 1)) </code></pre> <p>And when I use <code>choice</code>, it doesn't print alternatives.</p> <p>I expect this behavior to be related with combinator functions, and not with tokens, but seems like I'm wrong. How can I fix this?</p> <p>Here's the full lexer + parser code: </p> <p>Lexer:</p> <pre><code>module Lexer ( Token(..) , TokenPos(..) , tokenize ) where import Text.ParserCombinators.Parsec hiding (token, tokens) import Control.Applicative ((&lt;*), (*&gt;), (&lt;$&gt;), (&lt;*&gt;)) data Token = Ide String | Number String | Bool String | LBrack | RBrack | LBrace | RBrace | Keyword String deriving (Show, Eq) type TokenPos = (Token, SourcePos) ide :: Parser TokenPos ide = do pos &lt;- getPosition fc &lt;- oneOf firstChar r &lt;- optionMaybe (many $ oneOf rest) spaces return $ flip (,) pos $ case r of Nothing -&gt; Ide [fc] Just s -&gt; Ide $ [fc] ++ s where firstChar = ['A'..'Z'] ++ ['a'..'z'] ++ "_" rest = firstChar ++ ['0'..'9'] parsePos p = (,) &lt;$&gt; p &lt;*&gt; getPosition lbrack = parsePos $ char '[' &gt;&gt; return LBrack rbrack = parsePos $ char ']' &gt;&gt; return RBrack lbrace = parsePos $ char '{' &gt;&gt; return LBrace rbrace = parsePos $ char '}' &gt;&gt; return RBrace token = choice [ ide , lbrack , rbrack , lbrace , rbrace ] tokens = spaces *&gt; many (token &lt;* spaces) tokenize :: SourceName -&gt; String -&gt; Either ParseError [TokenPos] tokenize = runParser tokens () </code></pre> <p>Parser:</p> <pre><code>module Parser where import Text.Parsec as P import Control.Monad.Identity import Lexer parseTest :: Show a =&gt; Parsec [TokenPos] () a -&gt; String -&gt; IO () parseTest p s = case tokenize "test" s of Left e -&gt; putStrLn $ show e Right ts' -&gt; P.parseTest p ts' tok :: Token -&gt; ParsecT [TokenPos] () Identity Token tok t = token show snd test where test (t', _) = case t == t' of False -&gt; Nothing True -&gt; Just t </code></pre> <p>SOLUTION:</p> <p>Ok, after fp4me's answer and reading Parsec's Char source more carefully, I ended up with this:</p> <pre><code>{-# LANGUAGE FlexibleContexts #-} module Parser where import Text.Parsec as P import Control.Monad.Identity import Lexer parseTest :: Show a =&gt; Parsec [TokenPos] () a -&gt; String -&gt; IO () parseTest p s = case tokenize "test" s of Left e -&gt; putStrLn $ show e Right ts' -&gt; P.parseTest p ts' type Parser a = Parsec [TokenPos] () a advance :: SourcePos -&gt; t -&gt; [TokenPos] -&gt; SourcePos advance _ _ ((_, pos) : _) = pos advance pos _ [] = pos satisfy :: (TokenPos -&gt; Bool) -&gt; Parser Token satisfy f = tokenPrim show advance (\c -&gt; if f c then Just (fst c) else Nothing) tok :: Token -&gt; ParsecT [TokenPos] () Identity Token tok t = (Parser.satisfy $ (== t) . fst) &lt;?&gt; show t </code></pre> <p>Now I'm getting same error messages:</p> <blockquote> <p>ghci> Parser.parseTest (choice [tok $ Ide "ok", tok $ Ide "nop"]) " asdf"<br> parse error at (line 1, column 1):<br> unexpected (Ide "asdf","test" (line 1, column 3))<br> expecting Ide "ok" or Ide "nop" </p> </blockquote>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. COWhy do you want to seperate lexing from parsing? Surely the main reason for doing this is tradition - it was simpler to write a tricky parser free of the implementation details of the lexer (which was more routine, perhaps just regular expressions), and in imperative languages, it makes thinking easier to seperate the stages. In nice Haskell Parsec land, writing the lexers and the parsers is nice and easy: lex some strings, combine them to parse them - you can almost write the definition of your language in combinators. Also, you're working hard to pass positions through; let Parsec do it.
      singulars
    2. CO@AndrewC, you may be right. I just wanted to see the good and bad parts of separating lexing and parsing stages in parsec. Now after I look my final code, I think I'll go with just parser. (also, once I was using alex+happy to parse an indentation-based grammar and lexing helped me to generate indent+dedent tokens, and let the parser work on simplified grammar. separate lexing stage in parsec could also help in this kind of situations)
      singulars
    3. CO@AndrewC, also, I really love Parsec and I think being able to work on different kind of streams(other than character streams) can be really helpful and writing a lexer helped me to understand how can I do that. Now I know how can I work on byte strings, for example.
      singulars
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload