Note that there are some explanatory texts on larger screens.

plurals
  1. POHtmlAgility Pack Parallelisation VS Winforms HtmlDocument Speed
    primarykey
    data
    text
    <p>I have a program which I'm trying to get to work as fast as possible. This program loads a number of different websites and performs some scraping on them.</p> <p>I used to perform the scraping by using the Forms.HtmlDocument (I basically download it by using WebRequests then I push it into a document using a WebBrowser control) - however that is impossible to parallelise cleanly due to not being able to force the WebBrowser to update when its not the main thread.</p> <p>So I decided to try out the HtmlAgilityPack, thinking that perhaps I could paralelise that. However then I read the following post:</p> <p><a href="https://stackoverflow.com/questions/7734295/how-to-get-max-performance-using-parallel-for-foreach-performance-timings-incl">How to get max performance using Parallel.For/ForEach? (performance timings included)</a></p> <p>Which suggests that it doesn't really paralelise very well.</p> <p>Converting all the code will take some time (due to the quirks and the complexity of it all) - however I'd like to know whether its worth it or not. If I avoid using WebGet (and instead obtain a stream using WebRequest and push that into the AgilityPack) - will that give me a useful performance increase? Currently it takes about 19 seconds for each iteration, with the majority of time spent waiting for the page to download.</p> <p>Any other ideas will be considered. Thanks.</p> <p>EDIT: While we're here, is there any speed increase with using either method (even in a single-threaded environment?)</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload