Note that there are some explanatory texts on larger screens.

plurals
  1. POHiveQL and rank()
    text
    copied!<p>I can't understand HiveQL rank(). I've found a couple of implementations of rank UDF's on the WWW, such as <a href="http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/doing_rank_with_hive" rel="nofollow">Edward's nice example</a>. I can load and access the functions, but I can't get them to do what I want. Here is a detailed example:</p> <p>Loading the UDF into the CLI process:</p> <pre><code>$ javac -classpath /home/hadoop/hadoop/hadoop-core-1.0.4.jar:/home/hadoop/hive/lib/hive-exec-0.10.0.jar com/m6d/hiveudf/Rank2.java $ jar -cvf Rank2.jar com/m6d/hiveudf/Rank2.class hive&gt; ADD JAR /home/hadoop/MyDemo/Rank2.jar; hive&gt; CREATE TEMPORARY FUNCTION Rank2 AS 'com.m6d.hiveudf.Rank2'; </code></pre> <p>Create a table:</p> <pre><code>create table purchases ( SalesRepId String, PurchaseOrderId INT, Amount INT ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'; </code></pre> <p>Load data from this CSV:</p> <pre><code>Jana,1,100 Nadia,2,200 Nadia,3,600 Daniel,4,80 Jana,5,120 William,6,170 Daniel,7,140 </code></pre> <p>With this from the CLI:</p> <pre><code>LOAD DATA LOCAL INPATH '/home/hadoop/MyDemo/purchases.csv' INTO TABLE purchases; </code></pre> <p>Now I can see my top Sales Reps:</p> <pre><code>select SalesRepId,sum(amount) as volume from purchases group by SalesRepId ORDER BY volume DESC; </code></pre> <p>Nadia has sold $800 of stuff, Daniel and Jana have both sold $220, and William has sold $170</p> <pre><code>SalesRep Amount -------- ------ Nadia 800 Daniel 220 Jana 220 William 170 </code></pre> <p>Now all I want to do is number them: Nadia is #1, Daniel and Jana are tied for #2, and William is #4 (not #3)</p> <pre><code>select SalesRepId, V.volume,rank2(V.volume) from (select SalesRepId,sum(amount) as volume from purchases group by SalesRepId ORDER BY volume DESC) V; </code></pre> <p>This is what I get, but NOT what I want:</p> <pre><code>SalesRep Amount Rank -------- ------ ---- Nadia 800 1 Daniel 220 1 Jana 220 2 William 170 1 </code></pre> <p>This is what I WANT, but I can't make hive do it for me:</p> <pre><code>SalesRep Amount Rank -------- ------ ---- Nadia 800 1 Daniel 220 2 Jana 220 2 William 170 4 </code></pre> <p>Can you help me with the correct HiveQL to rank my Sales Reps?</p> <p>Thanks to JtheRocker for his response. His change resulted in this list:</p> <pre><code>SalesRep Amount Rank -------- ------ ---- William 170 1 Daniel 220 2 Jana 220 2 Nadia 800 3 </code></pre> <p>A slight modification to show Nadia as 4th (not 3rd):</p> <pre><code>private row_number; @Override public Object evaluate(DeferredObject[] currentKey) throws HiveException { row_number++; if (!sameAsPreviousKey(currentKey)) { this.counter = row_number; copyToPreviousKey(currentKey); } return new Long(this.counter); } </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload