Note that there are some explanatory texts on larger screens.

plurals
  1. POWire up a Hadoop Jobfactorybean, multiple Reducers on single Hadoop Node
    primarykey
    data
    text
    <p><strong>What I want to achieve:</strong></p> <p>I have set up a Spring Batch Job containing Hadoop Tasks to process some larger files. To get multiple Reducers running for the job, i need to set the number of Reducers with <a href="http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#setNumReduceTasks%28int%29" rel="nofollow">setNumOfReduceTasks</a>. I'm trying to set this via the <a href="http://static.springsource.org/spring-hadoop/docs/1.0.0.M2/api/org/springframework/data/hadoop/mapreduce/JobFactoryBean.html" rel="nofollow">JobFactorybean</a>.</p> <p>My bean configuration in classpath:/META-INF/spring/batch-common.xml :</p> <pre><code>&lt;?xml version="1.0" encoding="UTF-8"?&gt; &lt;beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:p="http://www.springframework.org/schema/p" xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd"&gt; &lt;bean id="jobFactoryBean" class="org.springframework.data.hadoop.mapreduce.JobFactoryBean" p:numberReducers="5"/&gt; &lt;bean id="jobRepository" class="org.springframework.batch.core.repository.support.MapJobRepositoryFactoryBean" /&gt; &lt;bean id="transactionManager" class="org.springframework.batch.support.transaction.ResourcelessTransactionManager"/&gt; &lt;bean id="jobLauncher" class="org.springframework.batch.core.launch.support.SimpleJobLauncher" p:jobRepository-ref="jobRepository" /&gt; &lt;/beans&gt; </code></pre> <p>The XML is included via:</p> <pre><code> &lt;?xml version="1.0" encoding="UTF-8"?&gt; &lt;beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:context="http://www.springframework.org/schema/context" xsi:schemaLocation=" http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.0.xsd http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-3.0.xsd"&gt; &lt;context:property-placeholder location="classpath:batch.properties,classpath:hadoop.properties" ignore-resource-not-found="true" ignore-unresolvable="true" /&gt; &lt;import resource="classpath:/META-INF/spring/batch-common.xml" /&gt; &lt;import resource="classpath:/META-INF/spring/hadoop-context.xml" /&gt; &lt;import resource="classpath:/META-INF/spring/sort-context.xml" /&gt; &lt;/beans&gt; </code></pre> <p>I'm getting the beans for the jUnit Test via</p> <pre><code> JobLauncher launcher = ctx.getBean(JobLauncher.class); Map&lt;String, Job&gt; jobs = ctx.getBeansOfType(Job.class); JobFactoryBean jfb = ctx.getBean(JobFactoryBean.class); </code></pre> <p>The jUnit Test stops with a error:</p> <pre><code>No bean named '&amp;jobFactoryBean' is defined </code></pre> <p>So: the JobFactoryBean is not loaded, but the others are loaded correctly and without an error.</p> <p>Without the line </p> <pre><code>JobFactoryBean jfb = ctx.getBean(JobFactoryBean.class); </code></pre> <p>the project tests runs, but there is just one Reducer per job.</p> <p>The method </p> <pre><code>ctx.getBean("jobFactoryBean"); </code></pre> <p>returns a Hadoop Job. I would expect to get the factoryBean there...</p> <p>To test it I have extended the constructor of the Reducer to log each creation of a Reducer to get a notification when one is generated. So far I just get one entry in the log.</p> <p>I have a 2 VM's with 2 assigned cores and 2 GB ram each, and I'm trying o sort a 75MB file consisting of multiple books from Project Gutenberg.</p> <p>EDIT:</p> <p>Another thing i have tried is to set the number of the reducers in the hadoop job via the property, without a result.</p> <pre><code>&lt;job id="search-jobSherlockOk" input-path="${sherlock.input.path}" output-path="${sherlockOK.output.path}" mapper="com.romediusweiss.hadoopSort.mapReduce.SortMapperWords" reducer="com.romediusweiss.hadoopSort.mapReduce.SortBlockReducer" partitioner="com.romediusweiss.hadoopSort.mapReduce.SortPartitioner" number-reducers="2" validate-paths="false" /&gt; </code></pre> <ul> <li><p>the settings in the mapreduce-site.xml are on both nodes:</p> <pre><code>&lt;property&gt; &lt;name&gt;mapred.tasktracker.reduce.tasks.maximum&lt;/name&gt; &lt;value&gt;10&lt;/value&gt; &lt;/property&gt; </code></pre></li> </ul> <p><strong>...and Why:</strong></p> <p>I want to copy the example of the following blog post: <a href="http://www.philippeadjiman.com/blog/2009/12/20/hadoop-tutorial-series-issue-2-getting-started-with-customized-partitioning/" rel="nofollow">http://www.philippeadjiman.com/blog/2009/12/20/hadoop-tutorial-series-issue-2-getting-started-with-customized-partitioning/</a></p> <p>I need different Reducers on the same machine or a fully distributed environment to test the behaviour of the Partitioner. The first approach would be easier.</p> <p>P.s.: could a user with a higher reputation create a tag "spring-data-hadoop" Thank you!</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload