Note that there are some explanatory texts on larger screens.

plurals
  1. POWhen running Hadoop in pseudo-distributed mode, what directory should I use for hadoop.tmp.dir?
    primarykey
    data
    text
    <p>By default, Hadoop sets hadoop.tmp.dir to your /tmp folder. This is a problem, because /tmp gets wiped out by Linux when you reboot, leading to this lovely error from the JobTracker :</p> <pre><code>2012-10-05 07:41:13,618 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 0 time(s). ... 2012-10-05 07:41:22,636 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 9 time(s). 2012-10-05 07:41:22,643 INFO org.apache.hadoop.mapred.JobTracker: problem cleaning system directory: null java.net.ConnectException: Call to localhost/127.0.0.1:8020 failed on connection exception: java.net.ConnectException: Connection refused at org.apache.hadoop.ipc.Client.wrapException(Client.java:767) </code></pre> <p>The only way I've found to fix this is to reformat your name node, which rebuilds the /tmp/hadoop-root folder, which of course gets wiped out again when you reboot.</p> <p>So I went ahead and created a folder called /hadoop_temp and gave all users read/write access to it. I then set this property in my core-site.xml :</p> <pre><code> &lt;property&gt; &lt;name&gt;hadoop.tmp.dir&lt;/name&gt; &lt;value&gt;file:///hadoop_temp&lt;/value&gt; &lt;/property&gt; </code></pre> <p>When I re-formatted my namenode, Hadoop seemed happy, giving me this message :</p> <pre><code>12/10/05 07:58:54 INFO common.Storage: Storage directory file:/hadoop_temp/dfs/name has been successfully formatted. </code></pre> <p>However, when I looked at /hadoop_temp, I noticed that the folder was empty. And then when I restarted Hadoop and checked my JobTracker log, I saw this :</p> <pre><code>2012-10-05 08:02:41,988 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 0 time(s). ... 2012-10-05 08:02:51,010 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 9 time(s). 2012-10-05 08:02:51,011 INFO org.apache.hadoop.mapred.JobTracker: problem cleaning system directory: null java.net.ConnectException: Call to localhost/127.0.0.1:8020 failed on connection exception: java.net.ConnectException: Connection refused </code></pre> <p>And when I checked my namenode log, I saw this : </p> <pre><code>2012-10-05 08:00:31,206 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory /opt/hadoop/hadoop-0.20.2/file:/hadoop_temp/dfs/name does not exist. 2012-10-05 08:00:31,212 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed. org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /opt/hadoop/hadoop-0.20.2/file:/hadoop_temp/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible. </code></pre> <p>So, clearly I didn't configure something right. Hadoop still expects to see its files in the /tmp folder even though I set hadoop.tmp.dir to /hadoop_temp in core-site.xml. What did I do wrong? What's the accepted "right" value for hadoop.tmp.dir?</p> <p>Bonus question : what should I use for hbase.tmp.dir?</p> <p>System info :</p> <p>Ubuntu 12.04, Apache Hadoop .20.2, Apache HBase .92.1</p> <p>Thanks for taking a look!</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload