StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
13343152
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
14
CommunityOwnedDate
CreationDate
2012-11-12T11:46:28.570
FavoriteCount
0
LastActivityDate
2012-11-16T13:55:21.710
LastEditDate
2012-11-16T13:55:21.710
LastEditorUserId
1166476
OwnerUserId
1166476
ParentId
13287069
PostTypeId
2
Score
3
ViewCount
0
LastEditorDisplayName
text
Body
I code all my hadoop MR jobs in python. Let me just say that you need not use python for moving data. Use Sqoop : <a href="http://sqoop.apache.org/" rel="nofollow">http://sqoop.apache.org/</a> Sqoop is an open-source tool that allows users to extract data from a relational database into Hadoop for further processing. And its very simple to use. All you need to do is <ol> <li>Download and configure sqoop</li> <li>Create your mysql table schema</li> <li>Specify hadoop hdfs file name, result table name and column seperator.</li> </ol> Read this for more info : <a href="http://archive.cloudera.com/cdh/3/sqoop/SqoopUserGuide.html" rel="nofollow">http://archive.cloudera.com/cdh/3/sqoop/SqoopUserGuide.html</a> Advantage of using sqoop is that we can now convert our hdfs data to any type of relational database (mysql,derby,hive,etc) and vice versa with a single line command For your use case, please do necessary changes : mapper.py <pre><code>#!/usr/bin/env python import sys for line in sys.stdin: line = line.strip() if line.find("<row") != -1 : words=line.split(' ') campaignID=words[1].split('"')[1] adGroupID=words[2].split('"')[1] print "%s:%s:"%(campaignID,adGroupID) </code></pre> streaming command <pre><code>bin/hadoop jar contrib/streaming/hadoop-streaming-1.0.4.jar - file /path/to/mapper.py file -mapper /path/to/mapper.py file -file /path/to/reducer.py file -reducer /path/to/reducer.py file -input /user/input -output /user/output </code></pre> mysql <pre><code>create database test; use test; create table testtable ( a varchar (100), b varchar(100) ); </code></pre> sqoop <pre><code>./sqoop export --connect jdbc:mysql://localhost/test --username root --table testnow --export-dir /user/output --input-fields-terminated-by ':' </code></pre> Note : <ol> <li>Please change mapper as per your need </li> <li>I have used ':' as my column seperator in both the mapper and in sqoop command. Change as per needed.</li> <li>Sqoop tutorials : I have personally followed Hadoop:The Definitive Guide (Oreilly) as well as <a href="http://archive.cloudera.com/cdh/3/sqoop/SqoopUserGuide.html" rel="nofollow">http://archive.cloudera.com/cdh/3/sqoop/SqoopUserGuide.html</a>. </li> </ol>
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POHow to save data from hadoop to database using python
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USNicole Hu
UserOwnerUserId
1. USNicole Hu
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. POHow to save data from hadoop to database using python
 singulars
 PostTypePostTypeId
 PTQuestion
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTAcceptedByOriginator
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.