Note that there are some explanatory texts on larger screens.

plurals
  1. POHow to do multi-threading with asynchronous webrequests
    primarykey
    data
    text
    <p>I'm trying to implement .NET 4 helper/utility class which should retrieve HTML page sources based on the url list for webtesting tool. The solution should be scalable and have high performance.</p> <p>I have been researching and trying different solutions already many days, but cannot find out proper solution.</p> <p>Based on my understanding best way to achieve my goal would be to use asynchronous webrequests running parallel using TPL.</p> <p>In order to have full control to headers etc. I'm using HttpWebResponse instead of WebClient which is wrapping HttpWebResponse. In some cases the output should be chained to other tasks thus using TPL tasks could make sense.</p> <p>What I have achieved so far after many different trials/approaches,</p> <ol> <li><p>Implemented basic synchronous, asynchronous (APM) and parallel (using TPL tasks) solutions to see performance level of different solutions.</p></li> <li><p>To see the performance of asynchrounous parallel solution I used APM approach, BeginGetResponse and BeginRead, and run it in Parallel.ForEach. Everything works fine and I'm happy with the performance. Somehow I feel that using simple Parallel.ForEach is not the way to go and for example I don't know how would I use task chaining.</p></li> <li><p>Then I tried more sophisticated system using tasks for wrapping the APM solution by using TaskCompletionSource and iterator to iterate through the APM flow. I believe that this solution could be what I'm looking for, but there is a strange delay, something between 6-10s, which happens 2-3 times when running 500 urls list. </p> <p>Based on the logs the execution has went back to the thread which is calling async fetch in a loop when the delay happens. The delay doesn't happen always when execution moves back to the loop, just 2-3 times, other times it works fine. It looks like that the looping thread would create a set of tasks those would be processed by other threads and while most/all tasks are completed there would be delay (6-8s) before the loop continues creating remaining tasks and other threads are active again.</p></li> </ol> <p>The principle of iterator inside loop is:</p> <pre><code>IEnumerable&lt;Task&gt; DoExample(string input) { var aResult = DoAAsync(input); yield return aResult; var bResult = DoBAsync(aResult.Result); yield return bResult; var cResult = DoCAsync(bResult.Result); yield return cResult; … } Task t = Iterate(DoExample(“42”)); </code></pre> <p>I'm resolving the connection limit by using System.Net.ServicePointManager.DefaultConnectionLimit and timeout using ThreadPool.RegisterWaitForSingleObject</p> <p>My question is simply, what would be the best approach to implement helper/utility class for retrieving html pages which would:</p> <ul> <li>be scalable and have high performance</li> <li>use webrequests</li> <li>be easily chained to other tasks</li> <li>be able to use timeout</li> <li>use .NET 4 framework</li> </ul> <p>If you think that the solution of using APM, TaskCompletionSource and iterator, which I presented above, is fine I would appreciate any help for trying to solve the delay problem. </p> <p>I'm totally new to C# and Windows development so please don't mind if something what I'm trying out doesn't make too much sense.</p> <p>Any help would be highly appreciated as without getting this solved I have to drop my test tool development.</p> <p>Thanks</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload