Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>I don't know if you still want an answer but here is my bash...</p> <p>I can see the following problems in your code are as follows : </p> <ul> <li>You've asigned resultsName multiple times to multiple items, as a Dict could eventually be returned you must either add '*' to each occurence of resultsName or drop it from a number of elements. I'll assume you are after the content and not the tags and drop their names. FYI, The shortcut for setting parser.resultsName(name) is parser(name).</li> <li>Setting the resultsname to 'Contents' for everything is also a bad idea as we would loose information already available to us. Rather name CONTENTS by it's corresponding TAG.</li> <li>You are also making multiple items Optional within the0 ZeroOrMore, they are already 'optional' through the ZeroOrMore, so let's allow them to be variations using the '^' operator as there is no predefined sequence ie. pc tags could precede mul tags or vice versa. It seems reasonable to allow any combintation and collect these as we go by. </li> <li>As we also have to deal with multiples of a given tag we append '*' to the CONTENTS' resultsName so that we can collect the results into lists. </li> </ul> <p>First we create a function to create set of opening and closing tags, your DumbTagCreator is now called tagset :</p> <pre><code>from pyparsing import * def tagset(str, keywords = False): if keywords : return [Group(Literal('&lt;') + Keyword(str) + Literal('&gt;')).suppress(), Group(Literal('&lt;/') + Keyword(str) + Literal('/&gt;')).suppress()] else : return [Group(Literal('&lt;') + Literal(str) + Literal('&gt;')).suppress(), Group(Literal('&lt;/') + Literal(str) + Literal('&gt;')).suppress()] </code></pre> <p>Next we create the parser which will parse <code>&lt;tag\&gt;CONTENT&lt;/tag&gt;</code>, where CONTENT is the content we have an interest in, to return a dict so that we have <code>{'pc' : CONTENT, 'MW' : CONTENT, ...}</code>:</p> <pre><code>tagDict = {name : (tagset(name)) for name in ['pc','MW','L','mul','mat']} parser = None for name, tags in tagDict.iteritems() : if parser : parser = parser ^ (tags[0] + SkipTo(tags[1])(name) + tags[1]) else : parser = (tags[0] + SkipTo(tags[1])(name) + tags[1]) # If you have added the &lt;/mul&gt; tag deliberately... parser = Optional(Literal('&lt;mul/&gt;')) + ZeroOrMore(parser) # If you have added the &lt;/mul&gt; tag by acccident... parser = ZeroOrMore(parser) </code></pre> <p>and finally we test :</p> <pre><code>test = ['&lt;L&gt;1.1&lt;/L&gt;', '&lt;pc&gt;Page1,1&lt;/pc&gt; &lt;pc&gt;Page1,2&lt;/pc&gt; &lt;MW&gt;000001&lt;/MW&gt; &lt;L&gt;1.1&lt;/L&gt;', '&lt;mul/&gt;&lt;MW&gt;000003&lt;/MW&gt;&lt;pc&gt;1,1&lt;/pc&gt;&lt;L&gt;3.1&lt;/L&gt;', '&lt;mul/&gt; &lt;MW&gt;000003&lt;/MW&gt; &lt;pc&gt;1,1&lt;/pc&gt; &lt;L&gt;3.1&lt;/L&gt; '] for item in test : print {key:val.asList() for key,val in parser.parseString(item).asDict().iteritems()} </code></pre> <p>which should produce, assuming you want a dict of lists :</p> <pre><code>{'L': ['1.1']} {'pc': ['Page1,1', 'Page1,2'], 'MW': ['000001'], 'L': ['1.1']} {'pc': ['1,1'], 'MW': ['000003'], 'L': ['3.1']} {'pc': ['1,1'], 'MW': ['000003'], 'L': ['3.1']} </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload