Note that there are some explanatory texts on larger screens.

plurals
  1. POHBase distributed log-splitting keeps failing because unable to get a lease
    primarykey
    data
    text
    <p>We used up all the free space on our test HDFS cluster so HBase crashed. After cleaning up some space, we were able to restart HBase, but after the startup a distributed log split job keeps failing. The job looks like this:</p> <pre><code>Splitting log file hdfs://192.168.249.1:9000/hdfs/hbase/.logs/host-3,60020,1382113928374-splitting/host-3%2C60020%2C1382113928374.1382523937002 into a temporary staging area. </code></pre> <p>The Regionserver are trying to get a lease on the file for some time:</p> <pre><code>2013-10-24 11:50:47,662 DEBUG org.apache.hadoop.hbase.regionserver.SplitLogWorker: tasks arrived or departed 2013-10-24 11:50:47,671 INFO org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker host-4,60020,1382614844870 acquired task /hbase/splitlog/hdfs%3A%2F%2F192.168.249.1%3A9000%2Fhdfs%2Fhbase%2F.logs%2Fhost-3%2C60020%2C1382113928374-splitting%2Fhost-3%252C60020%252C1382113928374.1382523937002 2013-10-24 11:50:47,672 INFO org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting hlog: hdfs://192.168.249.1:9000/hdfs/hbase/.logs/host-3,60020,1382113928374-splitting/host-3%2C60020%2C1382113928374.1382523937002, length=41274332 2013-10-24 11:50:47,672 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: Recovering lease on dfs file hdfs://192.168.249.1:9000/hdfs/hbase/.logs/host-3,60020,1382113928374-splitting/host-3%2C60020%2C1382113928374.1382523937002 2013-10-24 11:50:47,673 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: recoverLease=false, attempt=0 on file=hdfs://192.168.249.1:9000/hdfs/hbase/.logs/host-3,60020,1382113928374-splitting/host-3%2C60020%2C1382113928374.1382523937002 after 1ms 2013-10-24 11:50:50,674 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: recoverLease=false, attempt=1 on file=hdfs://192.168.249.1:9000/hdfs/hbase/.logs/host-3,60020,1382113928374-splitting/host-3%2C60020%2C1382113928374.1382523937002 after 3002ms 2013-10-24 11:50:51,674 DEBUG org.apache.hadoop.hbase.util.FSHDFSUtils: isFileClosed not available 2013-10-24 11:51:51,680 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: recoverLease=false, attempt=2 on file=hdfs://192.168.249.1:9000/hdfs/hbase/.logs/host-3,60020,1382113928374-splitting/host-3%2C60020%2C1382113928374.1382523937002 after 64008ms </code></pre> <p>Then the Master abort the job:</p> <pre><code>2013-10-24 11:55:48,685 INFO org.apache.hadoop.hbase.regionserver.SplitLogWorker: Sending interrupt to stop the worker thread 2013-10-24 11:55:48,687 WARN org.apache.hadoop.hbase.regionserver.SplitLogWorker: log splitting of hdfs://192.168.249.1:9000/hdfs/hbase/.logs/host-3,60020,1382113928374-splitting/host-3%2C60020%2C1382113928374.1382523937002 interrupted, resigning java.io.InterruptedIOException at org.apache.hadoop.hbase.util.FSHDFSUtils.recoverDFSFileLease(FSHDFSUtils.java:136) at org.apache.hadoop.hbase.util.FSHDFSUtils.recoverFileLease(FSHDFSUtils.java:54) at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getReader(HLogSplitter.java:780) at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:414) at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:381) at org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:112) at org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:280) at org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:211) at org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:179) at java.lang.Thread.run(Thread.java:724) Caused by: java.lang.InterruptedException: sleep interrupted at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.util.FSHDFSUtils.recoverDFSFileLease(FSHDFSUtils.java:118) ... 9 more </code></pre> <p>It seems to me that the problem is the Regionserver which are unable to get a lease on this file, because it's already open, so I checked with <code>sudo -u hdfs hadoop fsck /hdfs/hbase/.logs/ -openforwrite</code>, and it confirms:</p> <pre><code>OPENFORWRITE: /hdfs/hbase/.logs/host-3,60020,1382113928374-splitting/host-3%2C60020%2C1382113928374.1382523937002 41274332 bytes, 1 block(s), OPENFORWRITE: /hdfs/hbase/.logs/host-3,60020,1382113928374-splitting/host-3%2C60020%2C1382113928374.1382523937002: Under replicated blk_1073337163743094520_3534698. Target Replicas is 3 but found 2 replica(s). </code></pre> <p>I tried to shut down HBase, but the file stays OPENFORWRITE. How could I remove this flag?</p> <p>ps> Hadoop 1.0.1, HBase 0.94.12</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload