Note that there are some explanatory texts on larger screens.

plurals
  1. POHadoop MapReduce, Java implementation questions
    primarykey
    data
    text
    <p>Currently I'm into Apache Hadoop (with Java implementation of the MapReduce jobs). I looked into some examples (like the WordCount example). I have success with playing around writing custom mapreduce apps (I'm using Cloudera Hadoop Demo VM). My question is about some implementation and runtime questions.</p> <p>The prototype of the job class is as follows:</p> <pre><code>public class WordCount { public static class Map extends MapReduceBase implements Mapper&lt;LongWritable, Text, Text, IntWritable&gt; { public void map(LongWritable key, Text value, OutputCollector&lt;Text, IntWritable&gt; output, Reporter reporter) throws IOException { // mapping } } } public static class Reduce extends MapReduceBase implements Reducer&lt;Text, IntWritable, Text, IntWritable&gt; { public void reduce(Text key, Iterator&lt;IntWritable&gt; values, OutputCollector&lt;Text, IntWritable&gt; output, Reporter reporter) throws IOException { // reducing } } public static void main(String[] args) throws Exception { JobConf conf = new JobConf(WordCount.class); conf.setJobName("wordcount"); // setting map and reduce classes, and various configs JobClient.runJob(conf); } } </code></pre> <p>I have some questions, I tried to google them, but I must tell that documentation on hadoop is very formal (like a big reference book), not suitable for beginners.</p> <p>My questions:</p> <ul> <li>does the Map and Reduce classes have to be static inner classes in the Main class, or they can be anywhere (just visible from Main?)</li> <li>can you use anything that Java SE and available libraries have to offer like in an ordinary Java SE app? I mean, like JAXB, Guava, Jackson for JSON, etc</li> <li>what is the best practice to write generic solutions? I mean: we want to process big amounts of log files in different (but slightly similar) ways. The last token of the log file is always a JSON map with some entries. One processing could be: count and group by the log rows on (keyA, keyB from the map), and another could be: count and group by the log rows on (keyX, keyY from the map). (I'm thinking of some configfile-based solution, where you can provide the actually necessary entries to the program, you if you need a new resolution, you just have to provide the config and run the app).</li> <li>can be relevant: in the WordCount example the Map and Reduce classes are static inner classes and main() has zero influence on them, just provides these classes to the framework. Can you make these classes non-static, provide some fields and a constructor to alter the runtime with some current values (like the config parameters I mentioned).</li> </ul> <p>Maybe I'm digging in the details unnecessarily. The overall question is: is a hadoop mapreduce program still a normal JavaSE app we are used to?</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload