Note that there are some explanatory texts on larger screens.

plurals
  1. POCan parser combinators be made efficient?
    primarykey
    data
    text
    <p>Around 6 years ago, I benchmarked my own parser combinators in OCaml and found that they were ~5&#215; slower than the parser generators on offer at the time. I recently revisited this subject and benchmarked Haskell's Parsec vs a simple hand-rolled <a href="http://www.engr.mun.ca/~theo/Misc/exp_parsing.htm" rel="noreferrer">precedence climbing parser</a> written in F# and was surprised to find the F# to be 25&#215; faster than the Haskell.</p> <p>Here's the Haskell code I used to read a large mathematical expression from file, parse and evaluate it:</p> <pre><code>import Control.Applicative import Text.Parsec hiding ((&lt;|&gt;)) expr = chainl1 term ((+) &lt;$ char '+' &lt;|&gt; (-) &lt;$ char '-') term = chainl1 fact ((*) &lt;$ char '*' &lt;|&gt; div &lt;$ char '/') fact = read &lt;$&gt; many1 digit &lt;|&gt; char '(' *&gt; expr &lt;* char ')' eval :: String -&gt; Int eval = either (error . show) id . parse expr "" . filter (/= ' ') main :: IO () main = do file &lt;- readFile "expr" putStr $ show $ eval file putStr "\n" </code></pre> <p>and here's my self-contained precedence climbing parser in F#:</p> <pre><code>let rec (|Expr|) = function | P(f, xs) -&gt; Expr(loop (' ', f, xs)) | xs -&gt; invalidArg "Expr" (sprintf "%A" xs) and loop = function | ' ' as oop, f, ('+' | '-' as op)::P(g, xs) | (' ' | '+' | '-' as oop), f, ('*' | '/' as op)::P(g, xs) -&gt; let h, xs = loop (op, g, xs) match op with | '+' -&gt; (+) | '-' -&gt; (-) | '*' -&gt; (*) | '/' | _ -&gt; (/) |&gt; fun op -&gt; loop (oop, op f h, xs) | _, f, xs -&gt; f, xs and (|P|_|) = function | '('::Expr(f, ')'::xs) -&gt; Some(P(f, xs)) | c::_ as xs when '0' &lt;= c &amp;&amp; c &lt;= '9' -&gt; let rec loop n = function | c2::xs when '0' &lt;= c2 &amp;&amp; c2 &lt;= '9' -&gt; loop (10*n + int(string c2)) xs | xs -&gt; Some(P(n, xs)) loop 0 xs | _ -&gt; None </code></pre> <p>My impression is that even state-of-the-art parser combinators waste a lot of time back tracking. Is that correct? If so, is it possible to write parser combinators that generate state machines to obtain competitive performance or is it necessary to use code generation?</p> <p><strong>EDIT:</strong></p> <p>Here's the OCaml script I used to generate a ~2Mb expression for benchmarking:</p> <pre><code>open Printf let rec f ff n = if n=0 then fprintf ff "1" else fprintf ff "%a+%a*(%a-%a)" f (n-1) f (n-1) f (n-1) f (n-1) let () = let n = try int_of_string Sys.argv.(1) with _ -&gt; 3 in fprintf stdout "%a\n" f n </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload