Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>I think the best way to do something like this using C# 5/.Net 4.5 is to use <a href="http://msdn.microsoft.com/en-us/devlabs/gg585582">TPL Dataflow</a>. There even is <a href="http://blogs.microsoft.co.il/blogs/bnaya/archive/2012/01/28/tpl-dataflow-walkthrough-part-5.aspx">a walkthrough on how to implement web crawler using it</a>.</p> <p>Basically, you create one "block" that takes care of downloading one URL and getting the link from it:</p> <pre><code>var cts = new CancellationTokenSource(); Func&lt;LinkItem, Task&lt;IEnumerable&lt;LinkItem&gt;&gt;&gt; downloadFromLink = async link =&gt; { // WebClient is not guaranteed to be thread-safe, // so we shouldn't use one shared instance var client = new WebClient(); string html = await client.DownloadStringTaskAsync(link.Href); return LinkFinder.Find(html, link.BaseURL); }; var linkFinderBlock = new TransformManyBlock&lt;LinkItem, LinkItem&gt;( downloadFromLink, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 4, CancellationToken = cts.Token }); </code></pre> <p>You can set <code>MaxDegreeOfParallelism</code> to any value you want. It says at most how many URLs can be downloaded concurrently. If you don't want to limit it at all, you can set it to <code>DataflowBlockOptions.Unbounded</code>.</p> <p>Then you create one block that processes all the downloaded links somehow, like storing them all in a list. It can also decide when to cancel downloading:</p> <pre><code>var links = new List&lt;LinkItem&gt;(); var storeBlock = new ActionBlock&lt;LinkItem&gt;( linkItem =&gt; { links.Add(linkItem); if (links.Count == maxSize) cts.Cancel(); }); </code></pre> <p>Since we didn't set <code>MaxDegreeOfParallelism</code>, it defaults to 1. That means using collection that is not thread-safe should be okay here.</p> <p>We create one more block: it will take a link from <code>linkFinderBlock</code>, and pass it both to <code>storeBlock</code> and back to <code>linkFinderBlock</code>.</p> <pre><code>var broadcastBlock = new BroadcastBlock&lt;LinkItem&gt;(li =&gt; li); </code></pre> <p>The lambda in its constructor is a "cloning function". You can use it to create a clone of the item if you want to, but it shouldn't be necessary here, since we don't modify the <code>LinkItem</code> after creation.</p> <p>Now we can connect the blocks together:</p> <pre><code>linkFinderBlock.LinkTo(broadcastBlock); broadcastBlock.LinkTo(storeBlock); broadcastBlock.LinkTo(linkFinderBlock); </code></pre> <p>Then we can start processing by giving the first item to <code>linkFinderBlock</code> (or <code>broadcastBlock</code>, if you want to also send it to <code>storeBlock</code>):</p> <pre><code>linkFinderBlock.Post(firstItem); </code></pre> <p>And finally wait until the processing is complete:</p> <pre><code>try { linkFinderBlock.Completion.Wait(); } catch (AggregateException ex) { if (!(ex.InnerException is TaskCanceledException)) throw; } </code></pre>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload