Note that there are some explanatory texts on larger screens.

plurals
  1. POLeaseExpiredException while running oozie fork
    primarykey
    data
    text
    <p>We are trying to run a <code>Oozie</code> workflow with 3 sub workflows running in parallel using <code>fork</code>. The sub-workflows contains a node running a native map reduce job, and subsequent two nodes running some complex <code>PIG</code> jobs. Finally the three sub-workflows are joined to a single <code>end</code> node. </p> <p>When we run this workflow, we get <code>LeaseExpiredException</code>. The exception occurs randomly while running the <code>PIG</code> jobs. There is no definite place when it occurs, but it occurs every time we run the WF. </p> <p>Also, if we remove the <code>fork</code> and run the sub-workflows sequentially, it works fine. However, our expectation is to have them run in parallel and same on some execution time.</p> <p>Can you please help me understand this issue and some pointers on where we could be going wrong. We are starting with <code>hadoop</code> development and haven't faced such an issue earlier.</p> <p>It looks like due to several tasks running in parallel, one of the threads closed a part file and when another thread tried to close the same, it throws the error. </p> <p>Following is the stack trace of the exception from the hadoop logs.</p> <pre><code>2013-02-19 10:23:54,815 INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher: 57% complete 2013-02-19 10:26:55,361 INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher: 59% complete 2013-02-19 10:27:59,666 ERROR org.apache.hadoop.hdfs.DFSClient: Exception closing file &lt;hdfspath&gt;/oozie-oozi/0000105-130218000850190-oozie-oozi-W/aggregateData--pig/output/_temporary/_attempt_201302180007_0380_m_000000_0/part-00000 : org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on &lt;hdfspath&gt;/oozie-oozi/0000105-130218000850190-oozie-oozi-W/aggregateData--pig/output/_temporary/_attempt_201302180007_0380_m_000000_0/part-00000 File does not exist. Holder DFSClient_attempt_201302180007_0380_m_000000_0 does not have any open files. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1664) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1655) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:1710) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1698) at org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:793) at sun.reflect.GeneratedMethodAccessor34.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1439) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1435) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1433) </code></pre> <p>Following is the sample for main workflow and one sub-workflow.</p> <p>Main Work-Flow:</p> <pre><code>&lt;workflow-app xmlns="uri:oozie:workflow:0.2" name="MainProcess"&gt; &lt;start to="forkProcessMain"/&gt; &lt;fork name="forkProcessMain"&gt; &lt;path start="Proc1"/&gt; &lt;path start="Proc2"/&gt; &lt;path start="Proc3"/&gt; &lt;/fork&gt; &lt;join name="joinProcessMain" to="end"/&gt; &lt;action name="Proc1"&gt; &lt;sub-workflow&gt; &lt;app-path&gt;${nameNode}${wfPath}/proc1_workflow.xml&lt;/app-path&gt; &lt;propagate-configuration/&gt; &lt;/sub-workflow&gt; &lt;ok to="joinProcessMain"/&gt; &lt;error to="fail"/&gt; &lt;/action&gt; &lt;action name="Proc2"&gt; &lt;sub-workflow&gt; &lt;app-path&gt;${nameNode}${wfPath}/proc2_workflow.xml&lt;/app-path&gt; &lt;propagate-configuration/&gt; &lt;/sub-workflow&gt; &lt;ok to="joinProcessMain"/&gt; &lt;error to="fail"/&gt; &lt;/action&gt; &lt;action name="Proc3"&gt; &lt;sub-workflow&gt; &lt;app-path&gt;${nameNode}${wfPath}/proc3_workflow.xml&lt;/app-path&gt; &lt;propagate-configuration/&gt; &lt;/sub-workflow&gt; &lt;ok to="joinProcessMain"/&gt; &lt;error to="fail"/&gt; &lt;/action&gt; &lt;kill name="fail"&gt; &lt;message&gt;WF Failure, 'wf:lastErrorNode()' failed, error message[${wf:errorMessage(wf:lastErrorNode())}]&lt;/message&gt; &lt;/kill&gt; &lt;end name="end"/&gt; </code></pre> <p></p> <p>Sub-WorkFlow:</p> <pre><code>&lt;workflow-app xmlns="uri:oozie:workflow:0.2" name="Sub Process"&gt; &lt;start to="Step1"/&gt; &lt;action name="Step1"&gt; &lt;java&gt; &lt;job-tracker&gt;${jobTracker}&lt;/job-tracker&gt; &lt;name-node&gt;${nameNode}&lt;/name-node&gt; &lt;prepare&gt; &lt;delete path="${step1JoinOutputPath}"/&gt; &lt;/prepare&gt; &lt;configuration&gt; &lt;property&gt; &lt;name&gt;mapred.queue.name&lt;/name&gt; &lt;value&gt;${queueName}&lt;/value&gt; &lt;/property&gt; &lt;/configuration&gt; &lt;main-class&gt;com.absd.mr.step1&lt;/main-class&gt; &lt;arg&gt;${wf:name()}&lt;/arg&gt; &lt;arg&gt;${wf:id()}&lt;/arg&gt; &lt;arg&gt;${tbMasterDataOutputPath}&lt;/arg&gt; &lt;arg&gt;${step1JoinOutputPath}&lt;/arg&gt; &lt;arg&gt;${tbQueryKeyPath}&lt;/arg&gt; &lt;capture-output/&gt; &lt;/java&gt; &lt;ok to="generateValidQueryKeys"/&gt; &lt;error to="fail"/&gt; &lt;/action&gt; &lt;action name="generateValidQueryKeys"&gt; &lt;pig&gt; &lt;job-tracker&gt;${jobTracker}&lt;/job-tracker&gt; &lt;name-node&gt;${nameNode}&lt;/name-node&gt; &lt;prepare&gt; &lt;delete path="${tbValidQuerysOutputPath}"/&gt; &lt;/prepare&gt; &lt;configuration&gt; &lt;property&gt; &lt;name&gt;pig.tmpfilecompression&lt;/name&gt; &lt;value&gt;true&lt;/value&gt; &lt;/property&gt; &lt;property&gt; &lt;name&gt;pig.tmpfilecompression.codec&lt;/name&gt; &lt;value&gt;lzo&lt;/value&gt; &lt;/property&gt; &lt;property&gt; &lt;name&gt;pig.output.map.compression&lt;/name&gt; &lt;value&gt;true&lt;/value&gt; &lt;/property&gt; &lt;property&gt; &lt;name&gt;pig.output.map.compression.codec&lt;/name&gt; &lt;value&gt;lzo&lt;/value&gt; &lt;/property&gt; &lt;property&gt; &lt;name&gt;pig.output.compression&lt;/name&gt; &lt;value&gt;true&lt;/value&gt; &lt;/property&gt; &lt;property&gt; &lt;name&gt;pig.output.compression.codec&lt;/name&gt; &lt;value&gt;lzo&lt;/value&gt; &lt;/property&gt; &lt;property&gt; &lt;name&gt;mapred.compress.map.output&lt;/name&gt; &lt;value&gt;true&lt;/value&gt; &lt;/property&gt; &lt;/configuration&gt; &lt;script&gt;${pigDir}/tb_calc_valid_accounts.pig&lt;/script&gt; &lt;param&gt;csvFilesDir=${csvFilesDir}&lt;/param&gt; &lt;param&gt;step1JoinOutputPath=${step1JoinOutputPath}&lt;/param&gt; &lt;param&gt;tbValidQuerysOutputPath=${tbValidQuerysOutputPath}&lt;/param&gt; &lt;param&gt;piMinFAs=${piMinFAs}&lt;/param&gt; &lt;param&gt;piMinAccounts=${piMinAccounts}&lt;/param&gt; &lt;param&gt;parallel=80&lt;/param&gt; &lt;/pig&gt; &lt;ok to="aggregateAumData"/&gt; &lt;error to="fail"/&gt; &lt;/action&gt; &lt;action name="aggregateAumData"&gt; &lt;pig&gt; &lt;job-tracker&gt;${jobTracker}&lt;/job-tracker&gt; &lt;name-node&gt;${nameNode}&lt;/name-node&gt; &lt;prepare&gt; &lt;delete path="${tbCacheDataPath}"/&gt; &lt;/prepare&gt; &lt;configuration&gt; &lt;property&gt; &lt;name&gt;pig.tmpfilecompression&lt;/name&gt; &lt;value&gt;true&lt;/value&gt; &lt;/property&gt; &lt;property&gt; &lt;name&gt;pig.tmpfilecompression.codec&lt;/name&gt; &lt;value&gt;lzo&lt;/value&gt; &lt;/property&gt; &lt;property&gt; &lt;name&gt;pig.output.map.compression&lt;/name&gt; &lt;value&gt;true&lt;/value&gt; &lt;/property&gt; &lt;property&gt; &lt;name&gt;pig.output.map.compression.codec&lt;/name&gt; &lt;value&gt;lzo&lt;/value&gt; &lt;/property&gt; &lt;property&gt; &lt;name&gt;pig.output.compression&lt;/name&gt; &lt;value&gt;true&lt;/value&gt; &lt;/property&gt; &lt;property&gt; &lt;name&gt;pig.output.compression.codec&lt;/name&gt; &lt;value&gt;lzo&lt;/value&gt; &lt;/property&gt; &lt;property&gt; &lt;name&gt;mapred.compress.map.output&lt;/name&gt; &lt;value&gt;true&lt;/value&gt; &lt;/property&gt; &lt;/configuration&gt; &lt;script&gt;${pigDir}/aggregationLogic.pig&lt;/script&gt; &lt;param&gt;csvFilesDir=${csvFilesDir}&lt;/param&gt; &lt;param&gt;tbValidQuerysOutputPath=${tbValidQuerysOutputPath}&lt;/param&gt; &lt;param&gt;tbCacheDataPath=${tbCacheDataPath}&lt;/param&gt; &lt;param&gt;currDate=${date}&lt;/param&gt; &lt;param&gt;udfJarPath=${nameNode}${wfPath}/lib&lt;/param&gt; &lt;param&gt;parallel=150&lt;/param&gt; &lt;/pig&gt; &lt;ok to="loadDataToDB"/&gt; &lt;error to="fail"/&gt; &lt;/action&gt; &lt;kill name="fail"&gt; &lt;message&gt;WF Failure, 'wf:lastErrorNode()' failed, error message[${wf:errorMessage(wf:lastErrorNode())}]&lt;/message&gt; &lt;/kill&gt; &lt;end name="end"/&gt; </code></pre> <p></p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload