StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POParsing and writing log data from mapreduce to hive
text
Body
copied!<p>Ive written a small hadoop map program to parse(regex) information from log files generated from other apps. I found this article <a href="http://www.nearinfinity.com//blogs/stephen_mouring_jr/2013/01/04/writing-hive-tables-from-mapreduce.html" rel="nofollow">http://www.nearinfinity.com//blogs/stephen_mouring_jr/2013/01/04/writing-hive-tables-from-mapreduce.html</a> This article explains how to parse and write it into the hive table</p> <p>Here is my code</p> <pre><code>import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.GenericOptionsParser; public class ParseDataToDB { public static final String SEPARATOR_FIELD = new String(new char[] {1}); public static final String SEPARATOR_ARRAY_VALUE = new String(new char[] {2}); public static final BytesWritable NULL_KEY = new BytesWritable(); public static class MyMapper extends Mapper<LongWritable, Text, BytesWritable, Text> { //private final static IntWritable one = new IntWritable(1); private Text word = new Text(); private ArrayList<String> bazValues = new ArrayList<String>(); public void map(LongWritable key, Text value, OutputCollector<BytesWritable, Text> context) throws IOException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while(tokenizer.hasMoreTokens()){ word.set(tokenizer.nextToken()); if(word.find("extract") > -1) { System.out.println("in herer"); bazValues.add(line); } } // Build up the array values as a delimited string. StringBuilder bazValueBuilder = new StringBuilder(); int i = 0; for (String bazValue : bazValues) { bazValueBuilder.append(bazValue); ++i; if (i < bazValues.size()) { bazValueBuilder.append(SEPARATOR_ARRAY_VALUE); } } // Build up the column values / fields as a delimited string. String hiveRow = new String(); hiveRow += "fooValue"; hiveRow += SEPARATOR_FIELD; hiveRow += "barValue"; hiveRow += SEPARATOR_FIELD; hiveRow += bazValueBuilder.toString(); System.out.println("in herer hiveRow" + hiveRow); // StringBuilder hiveRow = new StringBuilder(); // hiveRow.append("fooValue"); // hiveRow.append(SEPARATOR_FIELD); // hiveRow.append("barValue"); // hiveRow.append(SEPARATOR_FIELD); // hiveRow.append(bazValueBuilder.toString()); // Emit a null key and a Text object containing the delimited fields context.collect(NULL_KEY, new Text(hiveRow)); } } public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException { Configuration conf = new Configuration(); String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs(); Job job = new Job(conf, "MyTest"); job.setJarByClass(ParseDataToDB.class); job.setMapperClass(MyMapper.class); job.setMapOutputKeyClass(BytesWritable.class); job.setMapOutputValueClass(Text.class); job.setOutputKeyClass(BytesWritable.class); job.setOutputValueClass(Text.class); FileInputFormat.addInputPath(job, new Path(otherArgs[0])); FileOutputFormat.setOutputPath(job, new Path(otherArgs[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } } </code></pre> <p>But when i run this app, i get an error saying "expected ByteWritable but recieved LongWritable. Can someone tell me what im doing wrong? Im new to hadoop programming. Im also open to creating external tables and pointing that to hdfs, again im struggling with implementation. Thanks.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload