Note that there are some explanatory texts on larger screens.

plurals
  1. POHow can I get Hadoop with Cascading to show me debug log output?
    text
    copied!<p>I'm having trouble getting Hadoop and <a href="http://www.cascading.org/" rel="nofollow noreferrer">Cascading</a> 1.2.6 to show me the output that's supposed to come from using the <a href="http://www.cascading.org/1.2/javadoc/cascading/operation/Debug.html" rel="nofollow noreferrer">Debug</a> filter. The <a href="http://www.cascading.org/1.2/userguide/htmlsingle/#N20F24" rel="nofollow noreferrer">Cascading guide says this is how you can view the current tuples</a>. I'm using this to try to see any debug output:</p> <pre><code>Debug debug = new Debug(Debug.Output.STDOUT, true); debug.setPrintTupleEvery(1); debug.setPrintFieldsEvery(1); assembly = new Each( assembly, DebugLevel.VERBOSE, debug ); </code></pre> <p>I'm pretty new to Hadoop and Cascading, but it's possible I'm not looking in the right place or that there's some simple log4j setting that I'm missing (I haven't made any changes to the defaults you get with Cloudera <code>hadoop-0.20.2-cdh3u3</code>.</p> <p>This is the WordCount sample class that I'm using (copied from the <a href="http://www.cascading.org/1.2/userguide/htmlsingle/#N2009C" rel="nofollow noreferrer">cascading user guide</a>) with Debug statements added in:</p> <pre><code>package org.cascading.example; import cascading.flow.Flow; import cascading.flow.FlowConnector; import cascading.operation.Aggregator; import cascading.operation.Debug; import cascading.operation.DebugLevel; import cascading.operation.Function; import cascading.operation.aggregator.Count; import cascading.operation.regex.RegexGenerator; import cascading.pipe.Each; import cascading.pipe.Every; import cascading.pipe.GroupBy; import cascading.pipe.Pipe; import cascading.scheme.Scheme; import cascading.scheme.TextLine; import cascading.tap.Hfs; import cascading.tap.SinkMode; import cascading.tap.Tap; import cascading.tuple.Fields; import java.util.Properties; public class WordCount { public static void main(String[] args) { String inputPath = args[0]; String outputPath = args[1]; // define source and sink Taps. Scheme sourceScheme = new TextLine( new Fields( "line" ) ); Tap source = new Hfs( sourceScheme, inputPath ); Scheme sinkScheme = new TextLine( new Fields( "word", "count" ) ); Tap sink = new Hfs( sinkScheme, outputPath, SinkMode.REPLACE ); // the 'head' of the pipe assembly Pipe assembly = new Pipe( "wordcount" ); // For each input Tuple // using a regular expression // parse out each word into a new Tuple with the field name "word" String regex = "(?&lt;!\\pL)(?=\\pL)[^ ]*(?&lt;=\\pL)(?!\\pL)"; Function function = new RegexGenerator( new Fields( "word" ), regex ); assembly = new Each( assembly, new Fields( "line" ), function ); Debug debug = new Debug(Debug.Output.STDOUT, true); debug.setPrintTupleEvery(1); debug.setPrintFieldsEvery(1); assembly = new Each( assembly, DebugLevel.VERBOSE, debug ); // group the Tuple stream by the "word" value assembly = new GroupBy( assembly, new Fields( "word" ) ); // For every Tuple group // count the number of occurrences of "word" and store result in // a field named "count" Aggregator count = new Count( new Fields( "count" ) ); assembly = new Every( assembly, count ); // initialize app properties, tell Hadoop which jar file to use Properties properties = new Properties(); FlowConnector.setApplicationJarClass( properties, WordCount.class ); // plan a new Flow from the assembly using the source and sink Taps FlowConnector flowConnector = new FlowConnector(); FlowConnector.setDebugLevel( properties, DebugLevel.VERBOSE ); Flow flow = flowConnector.connect( "word-count", source, sink, assembly ); // execute the flow, block until complete flow.complete(); // Ask Cascading to create a GraphViz DOT file // brew install graphviz # install viewer to look at dot file flow.writeDOT("build/flow.dot"); } } </code></pre> <p>It works fine, I just can't find any debug statements anywhere showing me the words. I've looked both through the HDFS filesystem with <code>hadoop dfs -ls</code> as well as through the <a href="http://localhost:50030/jobtracker.jsp" rel="nofollow noreferrer">jobtracker web ui</a>. The log output for a mapper in the jobtracker doesn't have any output for STDOUT:</p> <pre><code>Task Logs: 'attempt_201203131143_0022_m_000000_0' stdout logs stderr logs 2012-03-13 14:32:24.642 java[74752:1903] Unable to load realm info from SCDynamicStore syslog logs 2012-03-13 14:32:24,786 INFO org.apache.hadoop.security.UserGroupInformation: JAAS Configuration already set up for Hadoop, not re-installing. 2012-03-13 14:32:25,278 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2012-03-13 14:32:25,617 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId= 2012-03-13 14:32:25,903 INFO org.apache.hadoop.mapred.Task: Using ResourceCalculatorPlugin : null 2012-03-13 14:32:25,945 INFO cascading.tap.hadoop.MultiInputSplit: current split input path: hdfs://localhost/usr/tnaleid/shakespeare/input/comedies/cymbeline 2012-03-13 14:32:25,980 WARN org.apache.hadoop.io.compress.snappy.LoadSnappy: Snappy native library not loaded 2012-03-13 14:32:25,988 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1 2012-03-13 14:32:26,002 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 100 2012-03-13 14:32:26,246 INFO org.apache.hadoop.mapred.MapTask: data buffer = 79691776/99614720 2012-03-13 14:32:26,247 INFO org.apache.hadoop.mapred.MapTask: record buffer = 262144/327680 2012-03-13 14:32:27,623 INFO org.apache.hadoop.mapred.MapTask: Starting flush of map output 2012-03-13 14:32:28,274 INFO org.apache.hadoop.mapred.MapTask: Finished spill 0 2012-03-13 14:32:28,310 INFO org.apache.hadoop.mapred.Task: Task:attempt_201203131143_0022_m_000000_0 is done. And is in the process of commiting 2012-03-13 14:32:28,337 INFO org.apache.hadoop.mapred.Task: Task 'attempt_201203131143_0022_m_000000_0' done. 2012-03-13 14:32:28,361 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1 </code></pre> <p>At the end, I'm also writing out the DOT file, which does not have the Debug statement in it that I'd expect (though maybe those are stripped out):</p> <p><img src="https://i.stack.imgur.com/T1rqd.png" alt="word count flow diagram"></p> <p>Is there some log file somewhere that I'm missing, or is it a config setting that I need to set?</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload