Note that there are some explanatory texts on larger screens.

plurals
  1. POXPATH based content extraction from html pages
    text
    copied!<p>I m trying to extract content based on given xpath. When it is just one element i want to extract, there is no issue. When I have a list of items matching that xpath, then i get the nodelist and i can extract the values.</p> <p>However, there are a couple items related to each other forming a group, and that group repeats itself.</p> <p>One way I could do is to get the nodelist of parent node of all such groups and then apply SAX based parsing technique to extract information. But this would introduce pattern specific coding. I want to make it generic. ex.</p> <pre><code>&lt;html&gt;&lt;body&gt; &lt;!--... a lot divs and other tags ... --&gt; &lt;div class="divclass"&gt; &lt;item&gt; &lt;item_name&gt;blah1&lt;/item_name&gt; &lt;item_qty&gt;1&lt;/item_qty&gt; &lt;item_price&gt;100&lt;/item_price&gt; &lt;/item&gt; &lt;/div&gt; &lt;div class="divclass"&gt; &lt;item&gt; &lt;item_name&gt;blah2&lt;/item_name&gt; &lt;item_qty&gt;2&lt;/item_qty&gt; &lt;item_price&gt;200&lt;/item_price&gt; &lt;/item&gt; &lt;/div&gt; &lt;div class="divclass"&gt; &lt;item&gt; &lt;item_name&gt;blah3&lt;/item_name&gt; &lt;item_qty&gt;3&lt;/item_qty&gt; &lt;item_price&gt;300&lt;/item_price&gt; &lt;/item&gt; &lt;/div&gt; &lt;/body&gt;&lt;/html&gt; </code></pre> <p>I could easily write code for <strong>this</strong> xml but not a generic one which could parse any given specification.</p> <p>I should be able to create a <code>list</code> of <code>map</code> of <code>attribute-value</code> from above.</p> <p>Has anyone tried this?</p> <p><strong>EDIT</strong> List of input xpaths:</p> <pre><code>1. "html:div[@class='divclass']/item/item_name" 2. "html:div[@class='divclass']/item/item_qty" 3. "html:div[@class='divclass']/item/item_price" </code></pre> <p>Expected output in simple text:</p> <pre><code> item_name:blah1;item_qty:1;item_price:100 item_name:blah2;item_qty:2;item_price:200 item_name:blah3;item_qty:3;item_price:300 </code></pre> <p>Key point here is, if I apply each xpath separately, it would fetch me results vertically, i.e. first one will fetch all item_names, second will fetch all qtys. So I'll loose the co-relation within these pieces.</p> <p>Hope this clears my requirements.</p> <p>Thanks Nayn</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload