Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>I believe that your program is going to run into many problems as the input data is manipulated -- what if the case of 'title' changes, or there is a typo?</p> <p>It's not really possible to make a rigorous solution to scraping someone else's website, as they can at no notice completely change everything. Better is normally to write tolerant and flexible code that at least tries to verify that its output is sane. In this case it's probably best to iterate over the results of '//table/tr', then inside this loop, process the td elements:</p> <pre><code>import lxml.etree tree = lxml.etree.fromstring("&lt;table&gt;&lt;tr&gt;&lt;td&gt;test&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;div&gt;test2&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;") stringify = lambda x : "".join(x.xpath(".//text()")) for x in tree.xpath("//table/tr"): print "New row" for y in x.xpath("td"): print stringify(y) </code></pre> <p>Output:</p> <pre><code>New row test New row test2 </code></pre> <p>The following code will, however, get the list you ask for:</p> <pre><code>print map(stringify, tree.xpath("//table/tr/td")) </code></pre> <p>Output:</p> <pre><code>['test', 'test2'] </code></pre> <p>This will find all text elements which are at all descended from a td which is a direct descendant of a tr which is in turn a direct descendant of a table.</p> <p>(Simply asking for all text() elements will create some funny bugs when run on HTML which contains "&lt;td>Foo &lt;b>bar&lt;/b>&lt;/td>" or similar.)</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload