Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Let's go through this step by step:</p> <p><strong>Step 1: replicating the error.</strong> </p> <p>After verifying that the XPath will indeed not return a result, I wrote a little script to see how deep the XPath will go before it breaks</p> <pre><code>foreach (explode('/', $fullPath) as $segment) { $xpath .= trim($segment); echo '-------------------------------------------', PHP_EOL, 'Trying: ', $xpath, PHP_EOL, '-------------------------------------------', PHP_EOL; echo $xp-&gt;evaluate("string($xpath)"), PHP_EOL; $xpath .= '/'; } </code></pre> <p>The last thing it will return a result for is</p> <pre><code>/html/body/div[4]/div[@id='content']/div[@id='mainbar']/div[@id='question']/table </code></pre> <hr> <p><strong>Step 2: checking the markup</strong></p> <p>So I checked the markup returned by <code>DOMDocument::saveHTML()</code> to see what it looks like and there was no <code>&lt;tbody&gt;</code> <em>(reformatted for readability)</em>:</p> <pre><code>&lt;div id="question"&gt; &lt;div class="everyonelovesstackoverflow" id="adzerk1"&gt;&lt;/div&gt; &lt;table&gt; &lt;tr&gt;&lt;td class="votecell"&gt; </code></pre> <p>I then checked this very page to see if it was DOM throwing it away or if it really does not exist. It wasn't there. Apparently, Firebug inserts it, which would explain why you got the result with XPather (but not why you got it with YQL):</p> <p><img src="https://i.stack.imgur.com/GZBYE.png" alt="Screenshot showing page source and apparently bugged Firebug view"></p> <p><strong>Step 3: proofchecking and conclusion</strong></p> <p>I removed the <code>&lt;tbody&gt;</code> from the XPath and reran the script. No problems. Returns "Gaby". </p> <p>While I suspected a bug in Firebug first, Alejandro commented this would happen in IE's DeveloperTools, too. I then suspected this to be added by JavaScript but could not verify that. After some more research Alejandro pointed me to <a href="https://stackoverflow.com/questions/1678494/why-does-firebug-add-tbody-to-table">Why does firebug add <code>&lt;tbody&gt;</code> to <code>&lt;table&gt;</code>?</a> - it's actually neither Firebug nor JavaScript though, but the browser's themselves.</p> <p>So to modify my conclusion:</p> <p>Dont trust markup you see rendered in the browser, because it may be modified by the browser or other technologies. DOM will only download what is is served directly. If you run into similar issues again, you now know how to approach it though.</p> <hr> <p><strong>Some additional sidenotes</strong></p> <p>Unless you need to modify the markup before feeding it to DOM, you do not have to use <code>file_get_contents()</code> to load the content. You can use DOM's <code>loadHTMLFile()</code>:</p> <pre><code>$dom-&gt;loadHTMLFile('http://www.example.com/foo.htm'); </code></pre> <p>Also, the proper way to suppress errors is to tell <a href="http://de2.php.net/manual/en/book.libxml.php" rel="nofollow noreferrer">libxml</a> to use it's internal error handler. But instead of handling the errors, you simply clear them. This will only affect errors relating to libxml, e.g. parsing errors (as opposed to all PHP errors):</p> <pre><code>libxml_use_internal_errors(TRUE); libxml_clear_errors(); </code></pre> <p>Finally, xPath queries can be done in relation to a context node. So while the long XPath is efficient in terms of lookup time, you could simply use <code>getElementById()</code> to get the deepest known node and then use an XPath against it.</p> <p>In other words:</p> <pre><code>libxml_use_internal_errors(TRUE); $dom = new DOMDocument; $dom-&gt;loadHTMLFile('http://www.example.com/foo.htm'); libxml_clear_errors(); echo $xp-&gt;evaluate( 'string(td[2]/div/a)', $dom-&gt;getElementById('comment-4408626')); </code></pre> <p>will return "Gaby" as well.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload