StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POIntegration testing Hive jobs
primarykey
Id
16719541
data
AcceptedAnswerId
21817149
AnswerCount
6
ClosedDate
CommentCount
7
CommunityOwnedDate
CreationDate
2013-05-23T16:47:53.457
FavoriteCount
16
LastActivityDate
2018-01-23T18:46:56.023
LastEditDate
2017-07-11T07:02:45.103
LastEditorUserId
5279133
OwnerUserId
183983
ParentId
0
PostTypeId
1
Score
70
ViewCount
7164
LastEditorDisplayName
text
Body
I'm trying to write a non-trivial Hive job using the Hive Thrift and JDBC interfaces, and I'm having trouble setting up a decent JUnit test. By non-trivial, I mean that the job results in at least one MapReduce stage, as opposed to only dealing with the metastore. The test should fire up a Hive server, load some data into a table, run some non-trivial query on that table, and check the results. I've wired up a Spring context according to the <a href="http://static.springsource.org/spring-hadoop/docs/1.0.0.RELEASE/reference/html/hive.html" rel="noreferrer" title="Spring Hive context">Spring reference</a>. However, the job fails on the MapReduce phase, complaining that no Hadoop binary exists: <blockquote> java.io.IOException: Cannot run program "/usr/bin/hadoop" (in directory "/Users/yoni/opower/workspace/intellij_project_root"): error=2, No such file or directory </blockquote> The problem is that the Hive Server is running in-memory, but relies upon local installation of Hive in order to run. For my project to be self-contained, I need the Hive services to be embedded, including the HDFS and MapReduce clusters. I've tried starting up a Hive server using the same Spring method and pointing it at <a href="http://javasourcecode.org/html/open-source/hadoop/hadoop-0.20.203.0/org/apache/hadoop/hdfs/MiniDFSCluster.html" rel="noreferrer">MiniDFSCluster</a> and <a href="http://javasourcecode.org/html/open-source/hadoop/hadoop-0.20.2/org/apache/hadoop/mapred/MiniMRCluster.html" rel="noreferrer">MiniMRCluster</a>, similar to the pattern used in the Hive <a href="http://hive.apache.org/docs/r0.9.0/api/org/apache/hadoop/hive/ql/QTestUtil.html" rel="noreferrer">QTestUtil</a> source and in <a href="http://people.apache.org/~psmith/hbase/sandbox/hbase/hbase-core/testapidocs/org/apache/hadoop/hbase/HBaseTestingUtility.html" rel="noreferrer">HBaseTestUtility</a>. However, I've not been able to get that to work. After three days of trying to wrangle Hive integration testing, I thought I'd ask the community: <ol> <li>How do you recommend I integration test Hive jobs?</li> <li>Do you have a working JUnit example for integration testing Hive jobs using in-memory HDFS, MR, and Hive instances?</li> </ol> Additional resources I've looked at: <ul> <li><a href="http://dev.bizo.com/2011/04/hive-unit-testing.html" rel="noreferrer" title="Unit testing Hive">Hive Unit Testing tutorial</a></li> <li><a href="https://github.com/SpringSource/spring-hadoop-samples/tree/master/samples/hive" rel="noreferrer" title="Spring Hive example">the Spring Hive example</a></li> </ul> Edit: I am fully aware that working against a Hadoop cluster - whether local or remote - makes it possible to run integration tests against a full-stack Hive instance. The problem, as stated, is that this is not a viable solution for effectively testing Hive workflows.
Tags
<java><testing><hadoop><mapreduce><hive>
Title
Integration testing Hive jobs
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. This table or related slice is empty.
UserOwnerUserId
1. USyoni
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POIntegration testing Hive jobs
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 POIntegration testing Hive jobs
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 POIntegration testing Hive jobs
 UserUserId
 USyoni
 VoteTypeVoteTypeId
 VTFavorite
CommentsPostId
1. COSince it's looking for an installation, why not create a RAM disk that you can point it to? Other than that, you'll have to start examining the source to see how it uses the configuration you provide it. Then you can write your own glue to bypass the config, and run the features directly.
 singulars
 PostPostId
 POIntegration testing Hive jobs
 UserUserId
 USWeaponsGrade

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.