Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>I started answering this before I realised you were using 'beautiful soup' but here's a parser that I think works with your example string written using the HTMLParser library</p> <pre><code>from HTMLParser import HTMLParser results = {} class myParse(HTMLParser): def __init__(self): self.state = "" HTMLParser.__init__(self) def handle_starttag(self, tag, attrs): attrs = dict(attrs) if tag == "font" and attrs.has_key("class") and attrs['class'] == "test-proof": self.state = "getKey" def handle_endtag(self, tag): if self.state == "getKey" and tag == "font": self.state = "getValue" def handle_data(self, data): data = data.strip() if not data: return if self.state == "getKey": self.resultsKey = data elif self.state == "getValue": if results.has_key(self.resultsKey): results[self.resultsKey] += " " + data else: results[self.resultsKey] = data if __name__ == "__main__": p_tags = """&lt;p class="foo-body"&gt; &lt;font class="test-proof"&gt;Full name&lt;/font&gt; Foobar&lt;br /&gt; &lt;font class="test-proof"&gt;Born&lt;/font&gt; July 7, 1923, foo, bar&lt;br /&gt; &lt;font class="test-proof"&gt;Current age&lt;/font&gt; 27 years 226 days&lt;br /&gt; &lt;font class="test-proof"&gt;Major teams&lt;/font&gt; &lt;span style="white-space: nowrap"&gt;Japan,&lt;/span&gt; &lt;span style="white-space: nowrap"&gt;Jakarta,&lt;/span&gt; &lt;span style="white-space: nowrap"&gt;bazz,&lt;/span&gt; &lt;span style="white-space: nowrap"&gt;foo,&lt;/span&gt; &lt;span style="white-space: nowrap"&gt;foobazz&lt;/span&gt;&lt;br /&gt; &lt;font class="test-proof"&gt;Also&lt;/font&gt; bar&lt;br /&gt; &lt;font class="test-proof"&gt;foo style&lt;/font&gt; hand &lt;br /&gt; &lt;font class="test-proof"&gt;bar style&lt;/font&gt; ball&lt;br /&gt; &lt;font class="test-proof"&gt;foo position&lt;/font&gt; bak&lt;br /&gt; &lt;br class="bar" /&gt;&lt;/p&gt;""" parser = myParse() parser.feed(p_tags) print results </code></pre> <p>Gives the result:</p> <pre><code>{'foo position': 'bak', 'Major teams': 'Japan, Jakarta, bazz, foo, foobazz', 'Also': 'bar', 'Current age': '27 years 226 days', 'Born': 'July 7, 1923, foo, bar' , 'foo style': 'hand', 'bar style': 'ball', 'Full name': 'Foobar'} </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload