StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POException during reduce phase when remotely executing Hadoop job
primarykey
Id
16244731
data
AcceptedAnswerId
0
AnswerCount
3
ClosedDate
CommentCount
1
CommunityOwnedDate
CreationDate
2013-04-26T20:30:43.350
FavoriteCount
0
LastActivityDate
2013-05-07T00:38:06.890
LastEditDate
2013-04-30T18:40:21.110
LastEditorUserId
43665
OwnerUserId
43665
ParentId
0
PostTypeId
1
Score
3
ViewCount
319
LastEditorDisplayName
text
Body
I've got a small 10 node hadoop cluster running 1.0.4 and I'm trying to get it setup so I'm able to submit jobs from machines on the network that are not the NameNode. I've got a simple example setup where I execute the job using <a href="http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/util/ToolRunner.html#run%28org.apache.hadoop.conf.Configuration,%20org.apache.hadoop.util.Tool,%20java.lang.String%5B%5D%29" rel="nofollow"><code>ToolRunner</code></a>, building the <a href="http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobConf.html" rel="nofollow"><code>JobConf</code></a> manually, and submitting with <a href="http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobClient.html#submitJob%28org.apache.hadoop.mapred.JobConf%29" rel="nofollow"><code>JobClient.submitJob()</code></a>. Everything works as expected when I run this from the NameNode. When I run from any other node in the network the job is submitted and all map tasks successfully complete, but all the reduce tasks fail with the following exception: <pre><code>org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find output/map_0.out in any of the configured local directories at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:429) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:160) at org.apache.hadoop.mapred.MapOutputFile.getInputFile(MapOutputFile.java:161) at org.apache.hadoop.mapred.ReduceTask.getMapFiles(ReduceTask.java:220) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:398) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.Child.main(Child.java:249) </code></pre> I think that means that the reduce tasks can't find the output from the mappers. I'm fairly certain I'm just missing a config value somewhere, but I can't figure out which ones (I've tried <code>mapred.local.dir</code> and <code>hadoop.tmp.dir</code> with no success). Does anyone know exactly what the above message means and how to fix it, or know a simple way to execute jobs from machines other than the NameNode? Edit: I think this may also have something to do with permissions. The <code>hadoop</code> user owns pretty much all files on the hdfs, but when I'm logged in on a different machine it's as a different username. I've tried updating <code>mapred-site.xml</code> on all the nodes in the cluster similar to <a href="http://hadoop.apache.org/docs/stable/Secure_Impersonation.html" rel="nofollow">this</a>, and wrapping <code>JobClient.submitJob()</code> inside of a <code>UserGroupInformation.doAs()</code> but I still get an error similar to: <pre><code>SEVERE: PriviledgedActionException as:hadoop via oren cause:org.apache.hadoop.ipc.RemoteException: User: oren is not allowed to impersonate hadoop </code></pre>
Tags
<hadoop>
Title
Exception during reduce phase when remotely executing Hadoop job
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USjob
UserOwnerUserId
1. USjob
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
2. PO
 singulars
 PostTypePostTypeId
 PTAnswer
3. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POException during reduce phase when remotely executing Hadoop job
 UserUserId
 USjob
 VoteTypeVoteTypeId
 VTBountyStart
2. VO
 singulars
 PostPostId
 POException during reduce phase when remotely executing Hadoop job
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 POException during reduce phase when remotely executing Hadoop job
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId
1. COjob, Did you manage to solve the issue - I'm facing the same one? Can you provide any inputs?
 singulars
 PostPostId
 POException during reduce phase when remotely executing Hadoop job
 UserUserId
 USKaliyug Antagonist

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.