Note that there are some explanatory texts on larger screens.

plurals
  1. POWhat are some exotic parsing techniques?
    primarykey
    data
    text
    <p>I've been parsing poker hand histories for the past year and have learned quite a deal about parsing in general.</p> <p>We started with regexes but quickly realized that wouldn't scale easily. We skipped languages from ruby to c++ and finally came to grips that it was the algorithim that had to change.</p> <p>We picked up Boost::Spirit and watched our speed dramatically rise on orders of more than 10 times our original speed. We then skipped over to java and are currently using antlr to create grammars for each site. This is definitely the fastest method yet and it's very thorough which is nice cause you know exactly where you stand in terms of a 'complete' grammar. Unfortunately, I have spent incredible amounts of time working with these grammars -- they work pretty damn well but not perfectly yet.</p> <p>Anyways, enough with the background on to the question at hand -- are there any 'exotic' or less well known techniques to parsing that I'm not aware of? I only know of lexing/parsing a grammar and the other inferior regex/loop method.</p> <p>For those of you who are not familiar with poker hand histories I'll post one so you can tell what the structure is.</p> <pre><code>Full Tilt Poker Game #12037626529: Table durrrr (heads up, deep) - $500/$1000 - Pot Limit Omaha Hi - 2:00:48 ET - 2009/05/05 Seat 1: durrrr ($196,456.50) Seat 2: Gus Hansen ($65,499) durrrr posts the small blind of $500 Gus Hansen posts the big blind of $1,000 The button is in seat #1 *** HOLE CARDS *** durrrr raises to $3,000 Gus Hansen raises to $9,000 durrrr calls $6,000 *** FLOP *** [3d 4d 7d] Gus Hansen has 15 seconds left to act Gus Hansen checks durrrr checks *** TURN *** [3d 4d 7d] [Jh] Gus Hansen checks durrrr checks *** RIVER *** [3d 4d 7d Jh] [Ah] Gus Hansen has 15 seconds left to act Gus Hansen checks durrrr has 15 seconds left to act 123stayfree (Observer): GUS I NOW BRING U LUCK durrrr bets $7,600 Gus Hansen has 15 seconds left to act Gus Hansen has requested TIME Hernandez777 (Observer): Gus has the super-duper nuts Gus Hansen calls $7,600 Podobed45 (Observer): fluuuuuuuuuush *** SHOW DOWN *** durrrr shows [Kc 3s Qd As] two pair, Aces and Threes Gus Hansen mucks durrrr wins the pot ($33,199.50) with two pair, Aces and Threes *** SUMMARY *** Total pot $33,200 | Rake $0.50 Board: [3d 4d 7d Jh Ah] Seat 1: durrrr (small blind) collected ($33,199.50) Seat 2: Gus Hansen (big blind) mucked </code></pre> <p>I'm well aware of other methods of collecting the information (such as screen-scraping and dll injection) but the need to transform the hand history into structured data is still there so I'm looking only at methods that grab the info such as regex/grammars...</p> <p>I think if I don't find something I'm going to rewrite our grammars with ocamllex/ocamlyacc.</p> <p><strong>update</strong></p> <p>fyi: regexen speed was ~60 hands/sec while the grammars were processing 600+ hands/sec... the entire hand is transformed into xml after the data is all sorted out... there are between 20-30 regexes needed (at last count) for EACH site you want to parse....each site on the grammar side has it's own grammar with ungodly amounts of lexer/parser rules (but it's still smaller code size)</p> <p>I do have the dragon book and have been reading through it -- which has spurned my interest in using the ocamllex/ocamlyacc.... speed is the name of the game here.. </p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload