Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>Okay, first, in regards to parsing the HTML: if you follow the recommendation of zweiterlinde and S.Lott at least use the version of <a href="http://lxml.de/elementsoup.html" rel="nofollow noreferrer">beautifulsoup included with lxml</a>. That way you will also reap the benefit of a nice xpath or css selector interface.</p> <p>However, I personally prefer Ian Bicking's <a href="http://lxml.de/lxmlhtml.html" rel="nofollow noreferrer">HTML parser included in lxml</a>.</p> <p>Secondly, <code>.find()</code> and <code>.findall()</code> come from lxml trying to be compatible with ElementTree, and those two methods are described in <a href="http://effbot.org/zone/element-xpath.htm" rel="nofollow noreferrer">XPath Support in ElementTree</a>.</p> <p>Those two functions are fairly easy to use but they are very limited XPath. I recommend trying to use either the full lxml <a href="http://lxml.de/xpathxslt.html#the-xpath-method" rel="nofollow noreferrer"><code>xpath()</code> method</a> or, if you are already familiar with CSS, using the <a href="http://lxml.de/cssselect.html" rel="nofollow noreferrer"><code>cssselect()</code> method</a>.</p> <p>Here are some examples, with an HTML string parsed like this:</p> <pre><code>from lxml.html import fromstring mySearchTree = fromstring(your_input_string) </code></pre> <p>Using the css selector class your program would roughly look something like this:</p> <pre><code># Find all 'a' elements inside 'tr' table rows with css selector for a in mySearchTree.cssselect('tr a'): print 'found "%s" link to href "%s"' % (a.text, a.get('href')) </code></pre> <p>The equivalent using xpath method would be:</p> <pre><code># Find all 'a' elements inside 'tr' table rows with xpath for a in mySearchTree.xpath('.//tr/*/a'): print 'found "%s" link to href "%s"' % (a.text, a.get('href')) </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload