StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POC# Web Parsing Conflict
primarykey
Id
7814125
data
AcceptedAnswerId
0
AnswerCount
1
ClosedDate
CommentCount
2
CommunityOwnedDate
CreationDate
2011-10-18T21:42:22.937
FavoriteCount
1
LastActivityDate
2011-10-18T22:50:32.597
LastEditDate
LastEditorUserId
0
OwnerUserId
1002010
ParentId
0
PostTypeId
1
Score
0
ViewCount
264
LastEditorDisplayName
text
Body
It seems that Im encountering quite a few problems in a simple attempt to parse some HTML. As practice, I'm writting a mutli-threaded web crawler that starts with a list of sites to crawl. This gets handed down through a few classes, which should eventually return the content of the sites back to my system. This seems rather straightforward, but I've had no luck in either of the following tasks: A. Convert the content of a website ( In string format, from an HttpWebRequest Stream ) to an HtmlDocument ( Cannot create a new instance of an HtmlDocument? Doesn't make much sense... ) by using the HtmlDocument.Write() Method. or B. Collect an HtmlDocument via a WebBrowser instance. Here is my code as it exists, any advice would be great... <pre><code> public void Start() { if (this.RunningThread == null) { Console.WriteLine( "Executing SiteCrawler for " + SiteRoot.DnsSafeHost); this.RunningThread = new Thread(this.Start); this.RunningThread.SetApartmentState(ApartmentState.STA); this.RunningThread.Start(); } else { try { WebBrowser BrowserEmulator = new WebBrowser(); BrowserEmulator.Navigate(this.SiteRoot); HtmlElementCollection LinkCollection = BrowserEmulator.Document.GetElementsByTagName("a"); List<PageCrawler> PageCrawlerList = new List<PageCrawler>(); foreach (HtmlElement Link in LinkCollection) { PageCrawlerList.Add(new PageCrawler(Link.GetAttribute("href"), true)); continue; } return; } catch (Exception e) { throw new Exception("Exception encountered in SiteCrawler: " + e.Message); } } } </code></pre> This code seems to do nothing when it passes over the 'Navigate' method. I've attempted allowing it to open in a new window, which pops a new instance of IE, and proceeds to navigate to the specified address, but not before my program steps over the navigate method. I've tried waiting for the browser to be 'not busy', but it never seems to pick up the busy attribute anyway. I've tried creating a new document via the Browser.Document.OpenNew() so that I might populate it with data from a WebRequest stream, however as Im sure you can assume I get back a Null Pointer exception when I try to reach through the 'Document' portion of that statement. I've done some research and this appears to be the only way to create a new HtmlDocument. As you can see, this method is intended to kick off a 'PageCrawler' for every link in a specified page. I am sure that I could parse through the HTML character by character to find all of the links, after using an HttpWebRequest and collecting the data from the stream, but this is far more work than should be necessary to complete this. If anyone has any advice it would be greatly appreciated. Thank you.
Tags
<c#><.net><httpwebrequest><browser>
Title
C# Web Parsing Conflict
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. This table or related slice is empty.
UserOwnerUserId
1. USDigitalJedi805
plurals
PostLinksPostIdRelatedPostId
1. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
2. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POC# Web Parsing Conflict
 UserUserId
 USJames Johnson
 VoteTypeVoteTypeId
 VTFavorite
CommentsPostId
1. COConsider using http://www.codedblog.com/2007/08/29/google-web-toolkit-and-c/
 singulars
 PostPostId
 POC# Web Parsing Conflict
 UserUserId
 USagent-j
2. COWhen I tried to test your code I got compiler error about `this.RunningThread` `this.SiteRoot` `PageCrawler` etc. I suppose you expect a guess not a real answer.
 singulars
 PostPostId
 POC# Web Parsing Conflict
 UserUserId
 USL.B

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.