StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
16367635
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
1
CommunityOwnedDate
CreationDate
2013-05-03T21:12:36.453
FavoriteCount
0
LastActivityDate
2013-05-03T21:20:24.040
LastEditDate
2013-05-03T21:20:24.040
LastEditorUserId
680597
OwnerUserId
680597
ParentId
16355016
PostTypeId
2
Score
5
ViewCount
0
LastEditorDisplayName
text
Body
Here are your answers. <ol> <li>The mapper and reducer classes can be in separate Java classes, anywhere in the package structure or may in seperate jar files as long as the class loader of the MapTask/ReduceTask is able to load the mapper/reducer classes. The example that you shown is for a quick testing for Hadoop beginners.</li> <li>Yes, you can use any Java libraries. These third party jars should be made available to the MapTask/ReduceTask either through the <code>-files</code> option of <code>hadoop jar</code> command or using Hadoop API. Look at this link <a href="http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/" rel="noreferrer">here </a> For more information on adding third party libraries to Map/Reduce classpath</li> <li>Yes, you can configure and pass in the configurations to the Map/Reduce Jobs using either of these approaches. 3.1 Use the <code>org.apache.hadoop.conf.Configuration</code> object as below to set the configurations in the client program (the Java class with <code>main()</code> method <code>Configuration conf = new Configuration(); conf.set("config1", "value1"); Job job = new Job(conf, "Whole File input");</code></li> </ol> The Map/Reduce programs have access to the Configuration object and get the values set for the properties using <code>get()</code> method. This approach is advisable if the configuration settings are small. 3.2 Use the distributed cache to load the configurations and make it available in the Map/Reduce programs. Click <a href="http://hadoop.apache.org/docs/r0.20.2/api/org/apache/hadoop/filecache/DistributedCache.htm" rel="noreferrer">here</a> for details on distributed Cache. This approach is more advisable. 4.The <code>main()</code> is the client program which should be responsible for configuring and submitting the Hadoop job. If none of the configurations set, then the default settings will be used. The configurations such as Mapper class, Reducer Class, Input Path, Output path, Input Format class, Number of reducers etc. For eg: Additionally, look at the documentation <a href="http://hadoop.apache.org/docs/r0.18.3/mapred_tutorial.html#Job+Configuration" rel="noreferrer">here</a> on Job configuration Yes, Map/Reduce programs are still a JavaSE programs however, these are distributed across the machines in the Hadoop cluster. Lets say, the Hadoop cluster has 100 nodes and submitted the word count example. The Hadoop framework creates Java process for each of these Map and Reduce tasks and calls the call back methods such as <code>map()/reduce()</code> on subset of machines where the data exists. Essentially, the your mapper/reducer code gets executed on the machine where data exists. I would recommend you to read the Chapter 6 of <a href="http://rads.stackoverflow.com/amzn/click/0596521979" rel="noreferrer">The Definitive Guide</a> I hope, this helps.
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POHadoop MapReduce, Java implementation questions
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USNiranjan Sarvi
UserOwnerUserId
1. USNiranjan Sarvi
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. POHadoop MapReduce, Java implementation questions
 singulars
 PostTypePostTypeId
 PTQuestion
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTAcceptedByOriginator
3. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId
1. COThanks, very informative!
 singulars
 PostPostId
 PO
 UserUserId
 USgyorgyabraham

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.