Note that there are some explanatory texts on larger screens.

plurals
  1. POPyparsing: a list of optional elements: weird issue with Optional, Each, and ordering of parser elements
    primarykey
    data
    text
    <p>I'm trying to parse an XML-like file (with no associated DTD) with pyparsing. Part of each record looks has the following contents:</p> <ul> <li>Something within <code>&lt;L&gt;</code> and <code>&lt;L/&gt;</code> tags,</li> <li>One or more things within <code>&lt;pc&gt;</code> and <code>&lt;pc/&gt;</code> tags,</li> <li>Optionally, something within <code>&lt;MW&gt;</code> and <code>&lt;MW/&gt;</code> tags,</li> <li>Optionally, a literal <code>&lt;mul/&gt;</code>, and optionally a literal <code>&lt;mat/&gt;</code></li> </ul> <p>The ordering of these elements varies.</p> <p>So I wrote the following (I'm new to pyparsing; please point out if I'm doing something stupid):</p> <pre><code>#!/usr/bin/env python from pyparsing import * def DumbTagParser(tag): tag_close = '&lt;/%s&gt;' % tag return Group( Literal('&lt;') + Literal(tag).setResultsName('tag') + Literal('&gt;') + SkipTo(tag_close).setResultsName('contents') + Literal(tag_close) ).setResultsName(tag) record1 = Group(ZeroOrMore(DumbTagParser('pc'))).setResultsName('pcs') &amp;\ DumbTagParser('L') &amp; \ Optional(Literal('&lt;mul/&gt;')) &amp; \ Optional(DumbTagParser('MW')) &amp; \ Optional(Literal('&lt;mat/&gt;')) record2 = Group(ZeroOrMore(DumbTagParser('pc'))).setResultsName('pcs') &amp;\ Optional(DumbTagParser('MW')) &amp; \ Optional(Literal('&lt;mul/&gt;')) &amp; \ DumbTagParser('L') def attempt(s): print 'Attempting:', s match = record1.parseString(s, parseAll = True) print 'Match: ', match print attempt('&lt;L&gt;1.1&lt;/L&gt;') attempt('&lt;pc&gt;Page1,1&lt;/pc&gt; &lt;pc&gt;Page1,2&lt;/pc&gt; &lt;MW&gt;000001&lt;/MW&gt; &lt;L&gt;1.1&lt;/L&gt;') attempt('&lt;mul/&gt;&lt;MW&gt;000003&lt;/MW&gt;&lt;pc&gt;1,1&lt;/pc&gt;&lt;L&gt;3.1&lt;/L&gt;') attempt('&lt;mul/&gt; &lt;MW&gt;000003&lt;/MW&gt; &lt;pc&gt;1,1&lt;/pc&gt; &lt;L&gt;3.1&lt;/L&gt; ') # Note end space </code></pre> <p>Both parsers <code>record1</code> and <code>record2</code> fail, with different exceptions. With <code>record1</code>, it fails on the last string (which differs from the penultimate string only in spaces):</p> <pre><code>pyparsing.ParseException: (at char 47), (line:1, col:48) </code></pre> <p>and with <code>record2</code>, it fails on the penultimate string itself:</p> <pre><code>pyparsing.ParseException: Missing one or more required elements (Group:({"&lt;" "L" "&gt;" SkipTo:("&lt;/L&gt;") "&lt;/L&gt;"})) (at char 0), (line:1, col:1) </code></pre> <p>Now what is weird is that if I interchange lines 2 and 3 in the definition of <code>record2</code>, then it parses fine!</p> <pre><code>record2 = Group(ZeroOrMore(DumbTagParser('pc'))).setResultsName('pcs') &amp;\ Optional(Literal('&lt;mul/&gt;')) &amp; \ Optional(DumbTagParser('MW')) &amp; \ DumbTagParser('L') # parses my example strings fine </code></pre> <p>(Yes I realise that <code>record2</code> doesn't contain any rule for <code>&lt;mat/&gt;</code>. I'm trying to get a minimal example that reflects this sensitivity to reordering.)</p> <p>I'm not sure if this is a bug in pyparsing or in my code, but my real question is how I should parse the kind of strings I want. </p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload