Note that there are some explanatory texts on larger screens.

plurals
  1. POHadoop - constructor args for mapper
    text
    copied!<p>Is there any way to give constructor args to a Mapper in Hadoop? Possibly through some library that wraps the Job creation?</p> <p>Here's my scenario:</p> <pre><code>public class HadoopTest { // Extractor turns a line into a "feature" public static interface Extractor { public String extract(String s); } // A concrete Extractor, configurable with a constructor parameter public static class PrefixExtractor implements Extractor { private int endIndex; public PrefixExtractor(int endIndex) { this.endIndex = endIndex; } public String extract(String s) { return s.substring(0, this.endIndex); } } public static class Map extends Mapper&lt;Object, Text, Text, Text&gt; { private Extractor extractor; // Constructor configures the extractor public Map(Extractor extractor) { this.extractor = extractor; } public void map(Object key, Text value, Context context) throws IOException, InterruptedException { String feature = extractor.extract(value.toString()); context.write(new Text(feature), new Text(value.toString())); } } public static class Reduce extends Reducer&lt;Text, Text, Text, Text&gt; { public void reduce(Text key, Iterable&lt;Text&gt; values, Context context) throws IOException, InterruptedException { for (Text val : values) context.write(key, val); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = new Job(conf, "test"); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); job.setMapperClass(Map.class); job.setReducerClass(Reduce.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitForCompletion(true); } } </code></pre> <p>As should be clear, since the Mapper is only given to the <code>Configuration</code> as a class reference (<code>Map.class</code>), Hadoop has no way to pass a constructor argument and configure a specific Extractor.</p> <p>There are Hadoop-wrapping frameworks out there like Scoobi, Crunch, Scrunch (and probably many more I don't know about) that seem to have this capability, but I don't know how they accomplish it. <strong>EDIT:</strong> After some more working with Scoobi, I discovered I was partially wrong about this. If you use an externally defined object in the "mapper", Scoobi requires that it be serializable, and will complain at runtime if it isn't. So maybe the right way is just to make my <code>Extractor</code> serializable and de-serialize it in the Mapper's setup method...</p> <p>Also, I actually work in Scala, so Scala-based solutions are definitely welcome (if not encouraged!)</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload