Note that there are some explanatory texts on larger screens.

plurals
  1. POHow do you handle arbitrary namespaces when querying over Linq to XML?
    text
    copied!<p>I have a project where I am taking some particularly ugly "live" HTML and forcing it into a formal XML DOM with the HTML Agility Pack. What I would like to be able to do is then query over this with Linq to XML so that I can scrape out the bits I need. I'm using the method described <a href="http://vijay.screamingpens.com/archive/2008/05/26/linq-amp-lambda-part-3-html-agility-pack-to-linq.aspx" rel="nofollow noreferrer">here</a> to parse the HtmlDocument into an XDocument, but when trying to query over this I'm not sure how to handle namespaces. In one particular document the original HTML was actually poorly formatted XHTML with the following tag:</p> <pre><code>&lt;html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"&gt; </code></pre> <p>When trying to query from this document it seems that the namespace attribute is preventing me from doing something like:</p> <pre><code>var x = xDoc.Descendants("div"); // returns null </code></pre> <p>Apparently for those "div" tags only the LocalName is "div", but the proper tag name is the namespace plus "div". I have tried to do some research on the issue of XML namespaces and it seems that I can bypass the namespace by querying this way:</p> <pre><code>var x = (from x in xDoc.Descendants() where x.Name.LocalName == "div" select x); // works </code></pre> <p>However, this seems like a rather hacky solution and does not properly address the namespace issue. As I understand it a proper XML document can contain multiple namespaces and therefore the proper way to handle it should be to parse out the namespaces I'm querying under. Has anyone else ever had to do this? Am I just making it way to complicated? I know that I could avoid all this by just sticking with HtmlDocument and querying with XPath, but I would rather stick to what I know (Linq) if possible and I would also prefer to know that I am not setting myself up for further namespace-related issues down the road.</p> <p>What is the proper way to deal with namespaces in this situation?</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload