StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POLoading data from HDFS does not work with Elephantbird
primarykey
Id
17879259
data
AcceptedAnswerId
17944824
AnswerCount
1
ClosedDate
CommentCount
0
CommunityOwnedDate
CreationDate
2013-07-26T10:40:18.090
FavoriteCount
3
LastActivityDate
2014-10-02T15:57:01.633
LastEditDate
2013-07-29T16:56:22.327
LastEditorUserId
1824721
OwnerUserId
1824721
ParentId
0
PostTypeId
1
Score
0
ViewCount
1976
LastEditorDisplayName
text
Body
I am trying to process data with elephantbird in pig but I don't succeed in loading the data. Here is my pig script: <pre><code>register 'lib/elephant-bird-core-3.0.9.jar'; register 'lib/elephant-bird-pig-3.0.9.jar'; register 'lib/google-collections-1.0.jar'; register 'lib/json-simple-1.1.jar'; twitter = LOAD 'statuses.log.2013-04-01-00' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad'); DUMP twitter; </code></pre> The output I get is <pre><code>[main] INFO org.apache.pig.Main - Apache Pig version 0.11.0-cdh4.3.0 (rexported) compiled May 27 2013, 20:48:21 [main] INFO org.apache.pig.Main - Logging error messages to: /home/hadoop1/twitter_test/pig_1374834826168.log [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/hadoop1/.pigbootup not found [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://master.hadoop:8020 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: master.hadoop:8021 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 [main] WARN org.apache.pig.backend.hadoop23.PigJobControl - falling back to default JobControl (not using hadoop 0.23 ?) java.lang.NoSuchFieldException: jobsInProgress at java.lang.Class.getDeclaredField(Class.java:1938) at org.apache.pig.backend.hadoop23.PigJobControl.<clinit>(PigJobControl.java:58) at org.apache.pig.backend.hadoop.executionengine.shims.HadoopShims.newJobControl(HadoopShims.java:102) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:285) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:177) at org.apache.pig.PigServer.launchPlan(PigServer.java:1266) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1251) at org.apache.pig.PigServer.storeEx(PigServer.java:933) at org.apache.pig.PigServer.store(PigServer.java:900) at org.apache.pig.PigServer.openIterator(PigServer.java:813) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:696) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:320) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84) at org.apache.pig.Main.run(Main.java:604) at org.apache.pig.Main.main(Main.java:157) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:208) [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=656085089 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job6015425922938886053.jar [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job6015425922938886053.jar created [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. [JobControl] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 5 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201307261031_0050 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases twitter [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: twitter[10,10] C: R: [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://master.hadoop:50030/jobdetails.jsp?jobid=job_201307261031_0050 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure. [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201307261031_0050 has failed! Stop running all dependent jobs [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backed error: Error: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed! [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics: HadoopVersion PigVersion UserId StartedAt FinishedAt Features 2.0.0-cdh4.3.0 0.11.0-cdh4.3.0 hadoop1 2013-07-26 12:33:48 2013-07-26 12:34:23 UNKNOWN Failed! Failed Jobs: JobId Alias Feature Message Outputs job_201307261031_0050 twitter MAP_ONLY Message: Job failed! hdfs://master.hadoop:8020/tmp/temp971280905/tmp1376631504, Input(s): Failed to read data from "hdfs://master.hadoop:8020/user/hadoop1/statuses.log.2013-04-01-00" Output(s): Failed to produce result in "hdfs://master.hadoop:8020/tmp/temp971280905/tmp1376631504" Counters: Total records written : 0 Total bytes written : 0 Spillable Memory Manager spill count : 0 Total bags proactively spilled: 0 Total records proactively spilled: 0 Job DAG: job_201307261031_0050 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed! [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2997: Unable to recreate exception from backed error: Error: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected Details at logfile: /home/hadoop1/twitter_test/pig_1374834826168.log </code></pre> The file exists and is accessible: <pre><code>$ hdfs dfs -ls /user/hadoop1/statuses.log.2013-04-01-00 Found 1 items -rw-r--r-- 3 hadoop1 supergroup 656085089 2013-07-26 11:53 /user/hadoop1/statuses.log.2013-04-01-00 </code></pre> This seems to be a general problem with the pig version shipped with Cloudera 4.6.0: the problem seems to be the line that says <pre><code>[main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backed error: Error: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected </code></pre> I got a similar error when running another user defined function for loading data: <pre><code>[main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backed error: Error: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected </code></pre> When I force pig to local mode (''-x local'') I get the more obvious error <pre><code>Caused by: java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected </code></pre> So the version of Hadoop pig uses seems to be incompatible with the one shipped with Cloudera, I guess. 
Tags
<hadoop><apache-pig><cloudera><elephantbird>
Title
Loading data from HDFS does not work with Elephantbird
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USRobert
UserOwnerUserId
1. USRobert
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POLoading data from HDFS does not work with Elephantbird
 UserUserId
 USjulius kolbe
 VoteTypeVoteTypeId
 VTFavorite
CommentsPostId
1. This table or related slice is empty.

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.