Note that there are some explanatory texts on larger screens.

plurals
  1. POWhy doesn't xpath work when processing an XHTML document with lxml (in python)?
    text
    copied!<p>I am testing against the following test document:</p> <pre><code>&lt;?xml version="1.0" encoding="UTF-8"?&gt; &lt;!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"&gt; &lt;html xmlns="http://www.w3.org/1999/xhtml"&gt; &lt;head&gt; &lt;title&gt;hi there&lt;/title&gt; &lt;/head&gt; &lt;body&gt; &lt;img class="foo" src="bar.png"/&gt; &lt;/body&gt; &lt;/html&gt; </code></pre> <p>If I parse the document using lxml.html, I can get the IMG with an xpath just fine:</p> <pre><code>&gt;&gt;&gt; root = lxml.html.fromstring(doc) &gt;&gt;&gt; root.xpath("//img") [&lt;Element img at 1879e30&gt;] </code></pre> <p>However, if I parse the document as XML and try to get the IMG tag, I get an empty result:</p> <pre><code>&gt;&gt;&gt; tree = etree.parse(StringIO(doc)) &gt;&gt;&gt; tree.getroot().xpath("//img") [] </code></pre> <p>I can navigate to the element directly:</p> <pre><code>&gt;&gt;&gt; tree.getroot().getchildren()[1].getchildren()[0] &lt;Element {http://www.w3.org/1999/xhtml}img at f56810&gt; </code></pre> <p>But of course that doesn't help me process arbitrary documents. I would also expect to be able to query etree to get an xpath expression that will directly identify this element, which, technically I can do:</p> <pre><code>&gt;&gt;&gt; tree.getpath(tree.getroot().getchildren()[1].getchildren()[0]) '/*/*[2]/*' &gt;&gt;&gt; tree.getroot().xpath('/*/*[2]/*') [&lt;Element {http://www.w3.org/1999/xhtml}img at fa1750&gt;] </code></pre> <p>But that xpath is, again, obviously not useful for parsing arbitrary documents.</p> <p>Obviously I am missing some key issue here, but I don't know what it is. My best guess is that it has something to do with namespaces but the only namespace defined is the default and I don't know what else I might need to consider in regards to namespaces.</p> <p>So, what am I missing?</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload