StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
20391698
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
5
CommunityOwnedDate
CreationDate
2013-12-05T04:50:59.587
FavoriteCount
0
LastActivityDate
2013-12-05T16:31:21.867
LastEditDate
2013-12-05T16:31:21.867
LastEditorUserId
2568511
OwnerUserId
2568511
ParentId
20389529
PostTypeId
2
Score
2
ViewCount
0
LastEditorDisplayName
text
Body
<h1>1) Intro / Problem</h1> Before going ahead with the job driver, it is important to understand that in a simple-minded approach, the values of the reducers should be sorted in an ascending order. The first thought is to pass the value list unsorted and do some sorting in the reducer per key. This has two disadvantages: 1) It is most probably not efficient for large Value Lists and 2) How will the framework know if (1,4) is equal to (4,1) if these pairs are processed in different parts of the cluster? <h1>2) Solution in theory</h1> The way to do it in Hadoop is to "mock" the framework in a way by creating a synthetic key. So our map function instead of the "conceptually more appropriate" (if I may say that) <code>map(k1, v1) -> list(k2, v2)</code> is the following: <code>map(k1, v1) -> list(ksynthetic, null)</code> As you notice we discard the usage of values (the reducer still gets a list of <code>null</code> values but we don't really care about them). What happens here is that these values are actually included in <code>ksynthetic</code>. Here is an example for the problem in question: <code>`map(1, 2) -> list([1,2], null)</code> However, some more operations need to be done so that the keys are grouped and partitioned appropriately and we achieve the correct result in the reducer. <h1>3) Hadoop Implementation</h1> We will implement a class called <code>FFGroupKeyComparator</code> and a class <code>FindFriendPartitioner</code>. Here is our <code>FFGroupKeyComparator</code>: <pre><code>public static class FFGroupComparator extends WritableComparator { protected FFGroupComparator() { super(Text.class, true); } @Override public int compare(WritableComparable w1, WritableComparable w2) { Text t1 = (Text) w1; Text t2 = (Text) w2; String[] t1Items = t1.toString().split(","); String[] t2Items = t2.toString().split(","); String t1Base = t1Items[0]; String t2Base = t2Items[0]; int comp = t1Base.compareTo(t2Base); // We compare using "real" key part of our synthetic key return comp; } } </code></pre> This class will act as our Grouping Comparator class. It controls which keys are grouped together for a single call to <code>Reducer.reduce(Object, Iterable, org.apache.hadoop.mapreduce.Reducer.Context)</code> This is very important as it ensures that each reducer gets the appropriate synthetic keys ( judging by the real key). Due to the fact that Hadoop runs in a cluster with many nodes it is important to ensure that there as many reduce tasks as partitions. Their number should be the same as of the real keys (not synthetic). So, usually we do this with hash values. In our case, what we need to do is compute the partition that a synthetic key belongs based on the hash value of the real key (before the comma). So our <code>FindFriendPartitioner</code> is as follows: <pre><code>public static class FindFriendPartitioner extends Partitioner implements Configurable { @Override public int getPartition(Text key, Text NullWritable, int numPartitions) { String[] keyItems = key.toString().split(","); String keyBase = keyItems[0]; int part = keyBase.hashCode() % numPartitions; return part; } </code></pre> So now we are all set to write the actual job and solve our problem. I am assuming your input file looks like this: <pre><code>1,2 2,1 1,3 3,2 2,4 4,1 </code></pre> We will use the <code>TextInputFormat</code>. Here's the code for the job driver using Hadoop 1.0.4: <pre><code>public class FindFriendTwo { public static class FindFriendMapper extends Mapper<Object, Text, Text, NullWritable> { public void map(Object, Text value, Context context) throws IOException, InterruptedException { context.write(value, new NullWritable() ); String tempStrings[] = value.toString().split(","); Text value2 = new Text(tempStrings[1] + "," + tempStrings[0]); //reverse relationship context.write(value2, new NullWritable()); } </code></pre> } Notice that we also passed the reverse relationships in the <code>map</code> function. For example if the input string is (1,4) we must not forget (4,1). <pre><code>public static class FindFriendReducer extends Reducer<Text, NullWritable, IntWritable, IntWritable> { private Set<String> friendsSet; public void setup(Context context) { friendSet = new LinkedHashSet<String>(); } public void reduce(Text syntheticKey, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { String tempKeys[] = syntheticKey.toString().split(","); friendsSet.add(tempKeys[1]); if( friendsList.size() == 2 ) { IntWritable key = Integer.parseInt(tempKeys[0]); IntWritable value = Integer.parseInt(tempKeys[1]); write(key, value); } } } </code></pre> Finally, we must remember to include the following in our Main Class, so that the framework uses our classes. <pre><code>jobConf.setGroupingComparatorClass(FFGroupComparator.class); jobConf.setPartitionerClass(FindFriendPartitioner.class); </code></pre>
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POFind friend# of all users: How to implement with Hadoop Mapreduce?
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USArtem Tsikiridis
UserOwnerUserId
1. USArtem Tsikiridis
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUndeletion
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.