StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
14284154
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
3
CommunityOwnedDate
CreationDate
2013-01-11T18:11:22.937
FavoriteCount
0
LastActivityDate
2013-01-11T23:26:15.667
LastEditDate
2013-01-11T23:26:15.667
LastEditorUserId
736079
OwnerUserId
736079
ParentId
14132847
PostTypeId
2
Score
1
ViewCount
0
LastEditorDisplayName
text
Body
Looks like you can create a .NET 4.0 project, given the .NET framework versions you have installed on your machine. What type of project depends on how you'd like your application to run. I'd personally opt for creating a C# Class Library project that contains the load html and scrub code and then host that in whatever mechanism you want to use to actually open the files. To open a file from FileSystem, either use <code>File.OpenRead</code> or <code>File.ReadAllText</code> from <a href="http://msdn.microsoft.com/en-us/library/system.io.file.aspx" rel="nofollow"><code>System.IO.File</code></a>. You can pass the stream or the file contents to the <code>HtmlDocument.Load/LoadHtml</code> methods. <pre><code> HtmlDocument doc = new HtmlDocument(); // Use File.ReadAllText string contents = File.ReadAllText("PathToFileName"); doc.LoadHtml(contents); // Or use a stream using (var contents = File.OpenRead("PathToFileName")) { doc.Load(contents); } </code></pre> Possibilities for hosting are plentiful. Console Application (can be invoked from the command line or through the Task Scheduler), Windows Service (can be loaded in Windows, run in the background even when nobody is logged on to the machine and can potentially use the <code>FileSystemWatcher</code> to automatically pic up the files, or a Windows Forms/WPF application which will let the user select the files to process and then show the results somehow. As for how to use it, this is one of the primary issues with the Html Agility Pack. New ways of using it have been added over time and the actual library has therefore several ways you can use. You could take the old fashioned XPath query route (which was the original API) or you can use the Linq-to-HTML/XML route (which is the newer, way). Neither is better than the other, they both have their distinct advantages. The XPath solution allows you to store the queries in a text file easily, so it's great for a configurable system, while the Linq-To-HTML version is a little easier on the eyes from a developer perspective. As for how to download it, there are a number of options here as well. <ul> <li>You can indeed <a href="http://htmlagilitypack.codeplex.com/" rel="nofollow">download the sources from the CodePlex website</a>. Regardless of how you proceed, you might want to do that any way, it allows you to look under the hood and figrue out why something works the way it does, even if you don't compile the library yourself.</li> <li>You can download the binaries from CodePlex and store them with your project, before the creation of services such as NuGet, this was the only easy way for developers to distribute their libraries.</li> <li>I'd personally choose to go the NuGet route. When you're using Visual Studio 2012, NuGet is already integrated with Visual Studio. When you're using Visual Studio 2010, <a href="http://docs.nuget.org/docs/start-here/installing-nuget" rel="nofollow">you'll have to install the NuGet extension</a> to get the same functionality. Once installed you can <a href="http://docs.nuget.org/docs/start-here/using-the-package-manager-console" rel="nofollow">open the Nuget Package manager Console from within Visual Studio</a>. With a Visual Studio Solution open and your freshly created Class Library selected in the Solution Explorer you then proceed to enter the <code>Install-Package HtmlAgilityPack</code> command to let Visual Studio download and install the proper version of the HTML Agility Pack for your project. No worries about which library to select, Visual Studio will do that for you.</li> </ul> How to use it now that you've installed the library completely depends on what type of HTML scrubbing you're after and whether you choose the XPath or the Linq-to-HTML route. But it generally comes down to loading the HTML Document: <pre><code> HtmlDocument doc = new HtmlDocument(); doc.Load(/* path to file or stream */); or doc.LoadHtml(/*string*/); </code></pre> And after loading the file and catching any parsing errors that might occur, proceed to query the HTML using XPath like the contents are actually XML (<a href="http://msdn.microsoft.com/en-us/library/ms256115%28v=vs.100%29.aspx" rel="nofollow">the XML/XPath documentation from MSDN actually applies here</a>): <pre><code> var nodes = doc.DocumentNode.SelectNodes("//table/tr/td"); </code></pre> Or the same query using Linq-to-HTML: <pre><code> var nodes = doc.DocumentNode.Descendants("table") .Select(table => table.Elements("tr").Select(tr => tr.Elements("td"))); </code></pre> Or use the Linq-to-Html with Linq query syntax: <pre><code>var tds = from tables in doc.DocumentNode.Descendants("table") from tr in tables.Elements("tr") from td in tr.Elements("td") select td; </code></pre> You can make the queries as wild as you want. The syntax is either similar to the standard <code>XPathnavigator</code> syntax in the .NET Framework (using <code>SelectNodes</code>/<code>SelectSingleNode</code>/<code>Children</code> etc) or the Linq-to-XML syntax (using <code>.Descendants</code>/<code>.Ancesters</code>/<code>.Element(s)</code> and standard Linq). See also: <ul> <li><a href="http://msdn.microsoft.com/en-us/library/bb387098.aspx" rel="nofollow">Linq to XML documentation</a></li> <li><a href="http://msdn.microsoft.com/en-us/library/system.xml.xpath.xpathnavigator_methods.aspx" rel="nofollow">XPathNavigator/IXPathNavigable documentation</a></li> </ul>
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. PONeed some clarification regarding getting started with HTML Agility Pack
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USjessehouwing
UserOwnerUserId
1. USjessehouwing
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. PONeed some clarification regarding getting started with HTML Agility Pack
 singulars
 PostTypePostTypeId
 PTQuestion
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTAcceptedByOriginator
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.