Note that there are some explanatory texts on larger screens.

plurals
  1. POParsing html -> xml and querying with Xpath
    primarykey
    data
    text
    <p>I want to parse a html page to get some data. First, I convert it to XML document using <em>SgmlReader</em>. Then, I load the result to XMLDocument and then navigate through XPath:</p> <pre><code>//contains html document var loadedFile = LoadWebPage(); ... Sgml.SgmlReader sgmlReader = new Sgml.SgmlReader(); sgmlReader.DocType = "HTML"; sgmlReader.WhitespaceHandling = WhitespaceHandling.All; sgmlReader.CaseFolding = Sgml.CaseFolding.ToLower; sgmlReader.InputStream = new StringReader(loadedFile); XmlDocument doc = new XmlDocument(); doc.PreserveWhitespace = true; doc.XmlResolver = null; doc.Load(sgmlReader); </code></pre> <p>This code works fine for most cases, except on this site - <a href="http://www.arrow.com/" rel="nofollow noreferrer">www.arrow.com</a> (try to search something like OP295GS). I can get a table with result using the following XPath:</p> <pre><code>var node = doc.SelectSingleNode(".//*[@id='results-table']"); </code></pre> <p>This gives me a node with several child nodes:</p> <pre><code>[0] {Element, Name="thead"} [1] {Element, Name="tbody"} [2] {Element, Name="tbody"} FirstChild {Element, Name="thead"} </code></pre> <p>Ok, let's try to get some child nodes using XPath. But this doesn't work:</p> <pre><code>var childNodes = node.SelectNodes("tbody"); //childnodes.Count = 0 </code></pre> <p>This also:</p> <pre><code>var childNode = node.SelectSingleNode("thead"); // childNode = null </code></pre> <p>And even this:</p> <pre><code>var childNode = doc.SelectSingleNode(".//*[@id='results-table']/thead") </code></pre> <p>What can be wrong in Xpath queries?</p> <hr> <p>I've just tried to parse that HTML page with <em>Html Agility Pack</em> and my XPath queries work good. But my application use XmlDocument inside, <em>Html Agility Pack</em> doesn't suit me.</p> <hr> <p>I even tried the following trick with <em>Html Agility Pack</em>, but Xpath queries doesn't work also:</p> <pre><code>//let's parse and convert HTML document using HTML Agility Pack and then load //the result to XmlDocument HtmlDocument xmlDocument = new HtmlDocument(); xmlDocument.OptionOutputAsXml = true; xmlDocument.Load(new StringReader(webPage)); XmlDocument document = new XmlDocument(); document.LoadXml(xmlDocument.DocumentNode.InnerHtml); </code></pre> <p>Perhaps, web page contains errors (not all tags are closed and so on), but in spite of this I can see child nodes (through Quick Watch in Visual Studio), but cannot access them through XPath.</p> <hr> <p>My XPath queries works correctly in Firefox + FirePath + XPather plugins, but don't work in .net XmlDocument :(</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload