StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POHadoop MapReduce, Java implementation questions
primarykey
Id
16355016
data
AcceptedAnswerId
16367635
AnswerCount
1
ClosedDate
CommentCount
0
CommunityOwnedDate
CreationDate
2013-05-03T08:55:47.093
FavoriteCount
3
LastActivityDate
2013-05-03T21:20:24.040
LastEditDate
LastEditorUserId
0
OwnerUserId
972932
ParentId
0
PostTypeId
1
Score
5
ViewCount
1192
LastEditorDisplayName
text
Body
Currently I'm into Apache Hadoop (with Java implementation of the MapReduce jobs). I looked into some examples (like the WordCount example). I have success with playing around writing custom mapreduce apps (I'm using Cloudera Hadoop Demo VM). My question is about some implementation and runtime questions. The prototype of the job class is as follows: <pre><code>public class WordCount { public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { // mapping } } } public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { // reducing } } public static void main(String[] args) throws Exception { JobConf conf = new JobConf(WordCount.class); conf.setJobName("wordcount"); // setting map and reduce classes, and various configs JobClient.runJob(conf); } } </code></pre> I have some questions, I tried to google them, but I must tell that documentation on hadoop is very formal (like a big reference book), not suitable for beginners. My questions: <ul> <li>does the Map and Reduce classes have to be static inner classes in the Main class, or they can be anywhere (just visible from Main?)</li> <li>can you use anything that Java SE and available libraries have to offer like in an ordinary Java SE app? I mean, like JAXB, Guava, Jackson for JSON, etc</li> <li>what is the best practice to write generic solutions? I mean: we want to process big amounts of log files in different (but slightly similar) ways. The last token of the log file is always a JSON map with some entries. One processing could be: count and group by the log rows on (keyA, keyB from the map), and another could be: count and group by the log rows on (keyX, keyY from the map). (I'm thinking of some configfile-based solution, where you can provide the actually necessary entries to the program, you if you need a new resolution, you just have to provide the config and run the app).</li> <li>can be relevant: in the WordCount example the Map and Reduce classes are static inner classes and main() has zero influence on them, just provides these classes to the framework. Can you make these classes non-static, provide some fields and a constructor to alter the runtime with some current values (like the config parameters I mentioned).</li> </ul> Maybe I'm digging in the details unnecessarily. The overall question is: is a hadoop mapreduce program still a normal JavaSE app we are used to?
Tags
<java><hadoop><mapreduce>
Title
Hadoop MapReduce, Java implementation questions
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. This table or related slice is empty.
UserOwnerUserId
1. USgyorgyabraham
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POHadoop MapReduce, Java implementation questions
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 POHadoop MapReduce, Java implementation questions
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 POHadoop MapReduce, Java implementation questions
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId
1. This table or related slice is empty.

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.