StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POParsing html -> xml and querying with Xpath
primarykey
Id
5359805
data
AcceptedAnswerId
0
AnswerCount
2
ClosedDate
CommentCount
2
CommunityOwnedDate
CreationDate
2011-03-19T03:10:45.087
FavoriteCount
0
LastActivityDate
2017-04-29T16:31:08.570
LastEditDate
2017-04-29T16:31:08.570
LastEditorUserId
1033581
OwnerUserId
134787
ParentId
0
PostTypeId
1
Score
7
ViewCount
1922
LastEditorDisplayName
text
Body
I want to parse a html page to get some data. First, I convert it to XML document using SgmlReader. Then, I load the result to XMLDocument and then navigate through XPath: <pre><code>//contains html document var loadedFile = LoadWebPage(); ... Sgml.SgmlReader sgmlReader = new Sgml.SgmlReader(); sgmlReader.DocType = "HTML"; sgmlReader.WhitespaceHandling = WhitespaceHandling.All; sgmlReader.CaseFolding = Sgml.CaseFolding.ToLower; sgmlReader.InputStream = new StringReader(loadedFile); XmlDocument doc = new XmlDocument(); doc.PreserveWhitespace = true; doc.XmlResolver = null; doc.Load(sgmlReader); </code></pre> This code works fine for most cases, except on this site - <a href="http://www.arrow.com/" rel="nofollow noreferrer">www.arrow.com</a> (try to search something like OP295GS). I can get a table with result using the following XPath: <pre><code>var node = doc.SelectSingleNode(".//*[@id='results-table']"); </code></pre> This gives me a node with several child nodes: <pre><code>[0] {Element, Name="thead"} [1] {Element, Name="tbody"} [2] {Element, Name="tbody"} FirstChild {Element, Name="thead"} </code></pre> Ok, let's try to get some child nodes using XPath. But this doesn't work: <pre><code>var childNodes = node.SelectNodes("tbody"); //childnodes.Count = 0 </code></pre> This also: <pre><code>var childNode = node.SelectSingleNode("thead"); // childNode = null </code></pre> And even this: <pre><code>var childNode = doc.SelectSingleNode(".//*[@id='results-table']/thead") </code></pre> What can be wrong in Xpath queries? <hr> I've just tried to parse that HTML page with Html Agility Pack and my XPath queries work good. But my application use XmlDocument inside, Html Agility Pack doesn't suit me. <hr> I even tried the following trick with Html Agility Pack, but Xpath queries doesn't work also: <pre><code>//let's parse and convert HTML document using HTML Agility Pack and then load //the result to XmlDocument HtmlDocument xmlDocument = new HtmlDocument(); xmlDocument.OptionOutputAsXml = true; xmlDocument.Load(new StringReader(webPage)); XmlDocument document = new XmlDocument(); document.LoadXml(xmlDocument.DocumentNode.InnerHtml); </code></pre> Perhaps, web page contains errors (not all tags are closed and so on), but in spite of this I can see child nodes (through Quick Watch in Visual Studio), but cannot access them through XPath. <hr> My XPath queries works correctly in Firefox + FirePath + XPather plugins, but don't work in .net XmlDocument :(
Tags
<c#><.net><xml><html-parsing>
Title
Parsing html -> xml and querying with Xpath
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USCœur
UserOwnerUserId
1. USmlurker
plurals
PostLinksPostIdRelatedPostId
1. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
2. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POParsing html -> xml and querying with Xpath
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 POParsing html -> xml and querying with Xpath
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 POParsing html -> xml and querying with Xpath
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId
1. CO+1 for a good question, and for parsing HTML with the Agility Pack and XML parsers rather than regex.
 singulars
 PostPostId
 POParsing html -> xml and querying with Xpath
 UserUserId
 USJustin Morgan
2. COHTML Agility Pack is easy to use, but it has it's own data types, what can be a problem when integrating in an existing logic.
 singulars
 PostPostId
 POParsing html -> xml and querying with Xpath
 UserUserId
 USmlurker

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.