Note that there are some explanatory texts on larger screens.

plurals
  1. POToo many fetch failures: Hadoop on cluster (x2)
    text
    copied!<p>I have been using Hadoop for the last week or so (trying to get to grips with it), and although I have been able to set up a multinode cluster (2 machines: 1 laptop and a small desktop) and retrieve results, I always seem to encounter "Too many fetch failures" when I run a hadoop job.</p> <p>An example output (on a trivial wordcount example) is:</p> <pre><code>hadoop@ap200:/usr/local/hadoop$ bin/hadoop jar hadoop-examples-0.20.203.0.jar wordcount sita sita-output3X 11/05/20 15:02:05 INFO input.FileInputFormat: Total input paths to process : 7 11/05/20 15:02:05 INFO mapred.JobClient: Running job: job_201105201500_0001 11/05/20 15:02:06 INFO mapred.JobClient: map 0% reduce 0% 11/05/20 15:02:23 INFO mapred.JobClient: map 28% reduce 0% 11/05/20 15:02:26 INFO mapred.JobClient: map 42% reduce 0% 11/05/20 15:02:29 INFO mapred.JobClient: map 57% reduce 0% 11/05/20 15:02:32 INFO mapred.JobClient: map 100% reduce 0% 11/05/20 15:02:41 INFO mapred.JobClient: map 100% reduce 9% 11/05/20 15:02:49 INFO mapred.JobClient: Task Id : attempt_201105201500_0001_m_000003_0, Status : FAILED Too many fetch-failures 11/05/20 15:02:53 INFO mapred.JobClient: map 85% reduce 9% 11/05/20 15:02:57 INFO mapred.JobClient: map 100% reduce 9% 11/05/20 15:03:10 INFO mapred.JobClient: Task Id : attempt_201105201500_0001_m_000002_0, Status : FAILED Too many fetch-failures 11/05/20 15:03:14 INFO mapred.JobClient: map 85% reduce 9% 11/05/20 15:03:17 INFO mapred.JobClient: map 100% reduce 9% 11/05/20 15:03:25 INFO mapred.JobClient: Task Id : attempt_201105201500_0001_m_000006_0, Status : FAILED Too many fetch-failures 11/05/20 15:03:29 INFO mapred.JobClient: map 85% reduce 9% 11/05/20 15:03:32 INFO mapred.JobClient: map 100% reduce 9% 11/05/20 15:03:35 INFO mapred.JobClient: map 100% reduce 28% 11/05/20 15:03:41 INFO mapred.JobClient: map 100% reduce 100% 11/05/20 15:03:46 INFO mapred.JobClient: Job complete: job_201105201500_0001 11/05/20 15:03:46 INFO mapred.JobClient: Counters: 25 11/05/20 15:03:46 INFO mapred.JobClient: Job Counters 11/05/20 15:03:46 INFO mapred.JobClient: Launched reduce tasks=1 11/05/20 15:03:46 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=72909 11/05/20 15:03:46 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 11/05/20 15:03:46 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 11/05/20 15:03:46 INFO mapred.JobClient: Launched map tasks=10 11/05/20 15:03:46 INFO mapred.JobClient: Data-local map tasks=10 11/05/20 15:03:46 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=76116 11/05/20 15:03:46 INFO mapred.JobClient: File Output Format Counters 11/05/20 15:03:46 INFO mapred.JobClient: Bytes Written=1412473 11/05/20 15:03:46 INFO mapred.JobClient: FileSystemCounters 11/05/20 15:03:46 INFO mapred.JobClient: FILE_BYTES_READ=4462381 11/05/20 15:03:46 INFO mapred.JobClient: HDFS_BYTES_READ=6950740 11/05/20 15:03:46 INFO mapred.JobClient: FILE_BYTES_WRITTEN=7546513 11/05/20 15:03:46 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=1412473 11/05/20 15:03:46 INFO mapred.JobClient: File Input Format Counters 11/05/20 15:03:46 INFO mapred.JobClient: Bytes Read=6949956 11/05/20 15:03:46 INFO mapred.JobClient: Map-Reduce Framework 11/05/20 15:03:46 INFO mapred.JobClient: Reduce input groups=128510 11/05/20 15:03:46 INFO mapred.JobClient: Map output materialized bytes=2914947 11/05/20 15:03:46 INFO mapred.JobClient: Combine output records=201001 11/05/20 15:03:46 INFO mapred.JobClient: Map input records=137146 11/05/20 15:03:46 INFO mapred.JobClient: Reduce shuffle bytes=2914947 11/05/20 15:03:46 INFO mapred.JobClient: Reduce output records=128510 11/05/20 15:03:46 INFO mapred.JobClient: Spilled Records=507835 11/05/20 15:03:46 INFO mapred.JobClient: Map output bytes=11435785 11/05/20 15:03:46 INFO mapred.JobClient: Combine input records=1174986 11/05/20 15:03:46 INFO mapred.JobClient: Map output records=1174986 11/05/20 15:03:46 INFO mapred.JobClient: SPLIT_RAW_BYTES=784 11/05/20 15:03:46 INFO mapred.JobClient: Reduce input records=201001 </code></pre> <p>I did a google on the problem, and the people at apache seem to suggest it could be anything from a networking problem (or something to do with /etc/hosts files) or could be a corrupt disk on the slave nodes.</p> <p>Just to add: I do see 2 "live nodes" on namenode Admin panel (localhost:50070/dfshealth) and under Map/reduce Admin, I see 2 nodes aswell.</p> <p>Any clues as to how I can avoid these errors? Thanks in advance.</p> <p>Edit:1: </p> <p>The tasktracker log is on: <a href="http://pastebin.com/XMkNBJTh" rel="nofollow">http://pastebin.com/XMkNBJTh</a> The datanode log is on: <a href="http://pastebin.com/ttjR7AYZ" rel="nofollow">http://pastebin.com/ttjR7AYZ</a></p> <p>Many thanks.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload