Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>I code all my hadoop MR jobs in python. Let me just say that you need not use python for moving data. <strong>Use Sqoop</strong> : <a href="http://sqoop.apache.org/" rel="nofollow">http://sqoop.apache.org/</a></p> <p>Sqoop is an open-source tool that allows users to extract data from a relational database into Hadoop for further processing. And its very simple to use. All you need to do is </p> <ol> <li>Download and configure sqoop</li> <li>Create your mysql table schema</li> <li>Specify hadoop hdfs file name, result table name and column seperator.</li> </ol> <p>Read this for more info : <a href="http://archive.cloudera.com/cdh/3/sqoop/SqoopUserGuide.html" rel="nofollow">http://archive.cloudera.com/cdh/3/sqoop/SqoopUserGuide.html</a></p> <p><strong>Advantage</strong> of using sqoop is that we can now convert our hdfs data to any type of relational database (mysql,derby,hive,etc) and vice versa with a single line command</p> <p>For your use case, please do necessary changes : </p> <p>mapper.py</p> <pre><code>#!/usr/bin/env python import sys for line in sys.stdin: line = line.strip() if line.find("&lt;row") != -1 : words=line.split(' ') campaignID=words[1].split('"')[1] adGroupID=words[2].split('"')[1] print "%s:%s:"%(campaignID,adGroupID) </code></pre> <p>streaming command</p> <pre><code>bin/hadoop jar contrib/streaming/hadoop-streaming-1.0.4.jar - file /path/to/mapper.py file -mapper /path/to/mapper.py file -file /path/to/reducer.py file -reducer /path/to/reducer.py file -input /user/input -output /user/output </code></pre> <p>mysql</p> <pre><code>create database test; use test; create table testtable ( a varchar (100), b varchar(100) ); </code></pre> <p>sqoop</p> <pre><code>./sqoop export --connect jdbc:mysql://localhost/test --username root --table testnow --export-dir /user/output --input-fields-terminated-by ':' </code></pre> <p><strong>Note</strong> : </p> <ol> <li>Please change mapper as per your need </li> <li>I have used ':' as my column seperator in both the mapper and in sqoop command. Change as per needed.</li> <li>Sqoop tutorials : I have personally followed Hadoop:The Definitive Guide (Oreilly) as well as <a href="http://archive.cloudera.com/cdh/3/sqoop/SqoopUserGuide.html" rel="nofollow">http://archive.cloudera.com/cdh/3/sqoop/SqoopUserGuide.html</a>. </li> </ol>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload