Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>As I can infer from the comments below, you can probably do this in each of the map() function to make things easy. I saw you do the following, to pre-create some idle threads. You can move the following code to</p> <pre><code>if (once) { for (MSLiteThread thread : Threads) { System.out.println("created thread"); thread = new MSLiteThread(pile); thread.start(); } once = false; } </code></pre> <p>to,</p> <pre><code>public static class Map extends MapReduceBase implements Mapper&lt;LongWritable, Text, Text, Text&gt; { @Override public void configure(JobConf job) { for (MSLiteThread thread : Threads) { System.out.println("created thread"); thread = new MSLiteThread(pile); thread.start(); } } @Override public void map(LongWritable key, Text value, OutputCollector&lt;Text, Text&gt; output, Reporter reporter) { } } </code></pre> <p>So, that this could get initialized once and for that matter, don't need the <em>'once'</em> condition check anymore.</p> <p>Moreover, you don't need to do make idle threads as above. I don't know how much performance gain you'll get creating 16 idle threads as such.</p> <p>Anyways, here is a solution (may not be perfect though)</p> <p>You can use something like a countdownlatch <a href="http://javarevisited.blogspot.com/2012/07/countdownlatch-example-in-java.html#ixzz2ZnaE6HVM" rel="nofollow">Read more here</a> to process your urls in batches of N or more and block off until they are done. This is because, if you release each incoming url record to a thread, the next url will be fetched immediately and chances are that when you are processing the last url the same way, the map() function will return even if you have threads remaining in the queue to process. You'll inevitably get the exception you mentioned.</p> <p>Here in an example of how probably you can block off using a countdownlatch.</p> <pre><code> public static class Map extends MapReduceBase implements Mapper&lt;LongWritable, Text, Text, Text&gt; { @Override public void map(LongWritable key, Text value, OutputCollector&lt;Text, Text&gt; output, Reporter reporter) { String url = value.toString(); StringTokenizer urls = new StringTokenizer(url); Config.LoggerProvider = LoggerProvider.DISABLED; //setting countdownlatch to urls.countTokens() to block off that many threads. final CountDownLatch latch = new CountDownLatch(urls.countTokens()); while (urls.hasMoreTokens()) { try { word.set(urls.nextToken()); String currenturl = word.toString(); //create thread and fire for current URL here thread = new URLProcessingThread(currentURL, latch); thread.start(); } catch (Exception e) { e.printStackTrace(); continue; } } latch.await();//wait for 16 threads to complete execution //sleep here for sometime if you wish } } </code></pre> <p>Finally, in URLProcessingThread as soon as a URL is processed decrease the latch counter,</p> <pre><code>public class URLProcessingThread implments Runnable { CountDownLatch latch; URL url; public URLProcessingThread(URL url, CountDownLatch latch){ this.latch = latch; this.url = url; } void run() { //process url here //after everything finishes decrement the latch latch.countDown();//reduce count of CountDownLatch by 1 } } </code></pre> <p><strong>Probably problems seen with your code:</strong> At <code>pile.addUrl(currenturl, output);</code>, when you add a new url, in the meantime all the 16 threads will get the update (I'm not very sure), because the same <strong>pile</strong> object is passed to the 16 threads. There is a chance that your urls get re-processed or you can probably get some other side effects (I'm not very sure about that).</p> <p><strong>Other suggestion:</strong></p> <p>Additionally you may want to increase map task timeout using </p> <blockquote> <p>mapred.task.timeout</p> </blockquote> <p>(default=600000ms) = 10mins</p> <blockquote> <p><strong>Description:</strong> The number of milliseconds before a task will be terminated if it neither reads an input, writes an output, nor updates its status string.</p> </blockquote> <p>You can add/override this property in mapred-site.xml</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload