Note that there are some explanatory texts on larger screens.

plurals
  1. POUsing XmlSlurper: How to select sub-elements while iterating over a GPathResult
    text
    copied!<p>I am writing an HTML parser, which uses TagSoup to pass a well-formed structure to XMLSlurper.</p> <p>Here's the generalised code:</p> <pre><code>def htmlText = """ &lt;html&gt; &lt;body&gt; &lt;div id="divId" class="divclass"&gt; &lt;h2&gt;Heading 2&lt;/h2&gt; &lt;ol&gt; &lt;li&gt;&lt;h3&gt;&lt;a class="box" href="#href1"&gt;href1 link text&lt;/a&gt; &lt;span&gt;extra stuff&lt;/span&gt;&lt;/h3&gt;&lt;address&gt;Here is the address&lt;span&gt;Telephone number: &lt;strong&gt;telephone&lt;/strong&gt;&lt;/span&gt;&lt;/address&gt;&lt;/li&gt; &lt;li&gt;&lt;h3&gt;&lt;a class="box" href="#href2"&gt;href2 link text&lt;/a&gt; &lt;span&gt;extra stuff&lt;/span&gt;&lt;/h3&gt;&lt;address&gt;Here is another address&lt;span&gt;Another telephone: &lt;strong&gt;0845 1111111&lt;/strong&gt;&lt;/span&gt;&lt;/address&gt;&lt;/li&gt; &lt;/ol&gt; &lt;/div&gt; &lt;/body&gt; &lt;/html&gt; """ def html = new XmlSlurper(new org.ccil.cowan.tagsoup.Parser()).parseText( htmlText ); html.'**'.grep { it.@class == 'divclass' }.ol.li.each { linkItem -&gt; def link = linkItem.h3.a.@href def address = linkItem.address.text() println "$link: $address\n" } </code></pre> <p>I would expect the each to let me select each 'li' in turn so I can retrieve the corresponding href and address details. Instead, I am getting this output:</p> <pre><code>#href1#href2: Here is the addressTelephone number: telephoneHere is another addressAnother telephone: 0845 1111111 </code></pre> <p>I've checked various example on the web and these either deal with XML, or are one-liner examples like "retrieve all links from this file". It's seems that the it.h3.a.@href expression is collecting all hrefs in the text, even though I'm passing it a reference to the parent 'li' node. </p> <p>Can you let me know:</p> <ul> <li>Why I'm getting the output shown </li> <li>How I can retrieve the href/address pairs for each 'li' item</li> </ul> <p>Thanks.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload