StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POWire up a Hadoop Jobfactorybean, multiple Reducers on single Hadoop Node
primarykey
Id
12490889
data
AcceptedAnswerId
12584380
AnswerCount
1
ClosedDate
CommentCount
3
CommunityOwnedDate
CreationDate
2012-09-19T08:20:19.283
FavoriteCount
0
LastActivityDate
2012-09-25T13:50:55.307
LastEditDate
2012-09-25T08:25:03.097
LastEditorUserId
1637543
OwnerUserId
1637543
ParentId
0
PostTypeId
1
Score
0
ViewCount
630
LastEditorDisplayName
text
Body
What I want to achieve: I have set up a Spring Batch Job containing Hadoop Tasks to process some larger files. To get multiple Reducers running for the job, i need to set the number of Reducers with <a href="http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#setNumReduceTasks%28int%29" rel="nofollow">setNumOfReduceTasks</a>. I'm trying to set this via the <a href="http://static.springsource.org/spring-hadoop/docs/1.0.0.M2/api/org/springframework/data/hadoop/mapreduce/JobFactoryBean.html" rel="nofollow">JobFactorybean</a>. My bean configuration in classpath:/META-INF/spring/batch-common.xml : <pre><code><?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:p="http://www.springframework.org/schema/p" xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd"> <bean id="jobFactoryBean" class="org.springframework.data.hadoop.mapreduce.JobFactoryBean" p:numberReducers="5"/> <bean id="jobRepository" class="org.springframework.batch.core.repository.support.MapJobRepositoryFactoryBean" /> <bean id="transactionManager" class="org.springframework.batch.support.transaction.ResourcelessTransactionManager"/> <bean id="jobLauncher" class="org.springframework.batch.core.launch.support.SimpleJobLauncher" p:jobRepository-ref="jobRepository" /> </beans> </code></pre> The XML is included via: <pre><code> <?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:context="http://www.springframework.org/schema/context" xsi:schemaLocation=" http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.0.xsd http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-3.0.xsd"> <context:property-placeholder location="classpath:batch.properties,classpath:hadoop.properties" ignore-resource-not-found="true" ignore-unresolvable="true" /> <import resource="classpath:/META-INF/spring/batch-common.xml" /> <import resource="classpath:/META-INF/spring/hadoop-context.xml" /> <import resource="classpath:/META-INF/spring/sort-context.xml" /> </beans> </code></pre> I'm getting the beans for the jUnit Test via <pre><code> JobLauncher launcher = ctx.getBean(JobLauncher.class); Map<String, Job> jobs = ctx.getBeansOfType(Job.class); JobFactoryBean jfb = ctx.getBean(JobFactoryBean.class); </code></pre> The jUnit Test stops with a error: <pre><code>No bean named '&jobFactoryBean' is defined </code></pre> So: the JobFactoryBean is not loaded, but the others are loaded correctly and without an error. Without the line <pre><code>JobFactoryBean jfb = ctx.getBean(JobFactoryBean.class); </code></pre> the project tests runs, but there is just one Reducer per job. The method <pre><code>ctx.getBean("jobFactoryBean"); </code></pre> returns a Hadoop Job. I would expect to get the factoryBean there... To test it I have extended the constructor of the Reducer to log each creation of a Reducer to get a notification when one is generated. So far I just get one entry in the log. I have a 2 VM's with 2 assigned cores and 2 GB ram each, and I'm trying o sort a 75MB file consisting of multiple books from Project Gutenberg. EDIT: Another thing i have tried is to set the number of the reducers in the hadoop job via the property, without a result. <pre><code><job id="search-jobSherlockOk" input-path="${sherlock.input.path}" output-path="${sherlockOK.output.path}" mapper="com.romediusweiss.hadoopSort.mapReduce.SortMapperWords" reducer="com.romediusweiss.hadoopSort.mapReduce.SortBlockReducer" partitioner="com.romediusweiss.hadoopSort.mapReduce.SortPartitioner" number-reducers="2" validate-paths="false" /> </code></pre> <ul> <li>the settings in the mapreduce-site.xml are on both nodes: <pre><code><property> <name>mapred.tasktracker.reduce.tasks.maximum</name> <value>10</value> </property> </code></pre></li> </ul> ...and Why: I want to copy the example of the following blog post: <a href="http://www.philippeadjiman.com/blog/2009/12/20/hadoop-tutorial-series-issue-2-getting-started-with-customized-partitioning/" rel="nofollow">http://www.philippeadjiman.com/blog/2009/12/20/hadoop-tutorial-series-issue-2-getting-started-with-customized-partitioning/</a> I need different Reducers on the same machine or a fully distributed environment to test the behaviour of the Partitioner. The first approach would be easier. P.s.: could a user with a higher reputation create a tag "spring-data-hadoop" Thank you!
Tags
<spring><hadoop><spring-data>
Title
Wire up a Hadoop Jobfactorybean, multiple Reducers on single Hadoop Node
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USromedius
UserOwnerUserId
1. USromedius
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POWire up a Hadoop Jobfactorybean, multiple Reducers on single Hadoop Node
 UserUserId
 USromedius
 VoteTypeVoteTypeId
 VTBountyStart
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.