Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>A brief perusal of <code>HtmlAgilityPack.HtmlWeb</code> confirms that it is using the synchronous <code>WebRequest</code> API. You are therefore placing long running tasks into the ThreadPool (via <code>Parallel</code>). The ThreadPool is designed for short-lived operations that yield the thread back to the pool quickly. Blocking on IO is a big no-no. Given the ThreadPool's reluctance to start new threads (because it is not designed for this kind of usage), you're going to be constrained by this behaviour. </p> <p>Fetch your web content asynchronously (<em>see <a href="http://msdn.microsoft.com/en-us/library/system.net.webrequest.begingetresponse.aspx" rel="noreferrer">here</a> and <a href="http://msdn.microsoft.com/en-us/library/system.io.stream.beginread.aspx" rel="noreferrer">here</a> for the correct API to use, you'll have to investigate further yourself...</em>) so that you are not tying up the ThreadPool with blocking tasks. You can then feed the decoded response to the HtmlAgilityPack for parsing.</p> <p>If you really want to jazz up performance, you'll also need to consider that WebRequest is incapable of performing asynchronous DNS lookup. IMO this is a terrible flaw in the design of WebRequest.</p> <blockquote> <p>The BeginGetResponse method requires some synchronous setup tasks to complete (DNS resolution, proxy detection, and TCP socket connection, for example) before this method becomes asynchronous. </p> </blockquote> <p>It makes high performance downloading a real PITA. It's at about this time that you might consider writing your own HTTP library so that everything can execute without blocking (and therefore starving the ThreadPool).</p> <p>As an aside, getting maximum throughput when chumming through web-pages is a tricky affair. In my experience, you get the code right and are then let down by the routing equipment it has to go through. Many domestic routers simply aren't up to the job.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload