Note that there are some explanatory texts on larger screens.

plurals
  1. POPerformance: Pig vs Hive
    text
    copied!<p>I have discovered some (significant) performance differences (in terms of real time runtime as well as CPU time) between Pig and Hive and am looking for ways to come to the bottom of these differences. I have used both language's explain feature (i.e. Hive: EXPLAIN keyword, Pig: pig -e 'explain -script explain.pig') to contrast and compare the generated syntax tree, logical, physical and map-reduce plans. However both seem to do the same things. The job tracker however shows a difference in the number of map and reduce tasks launched (I consequently ensured that both use the same number of map and reduce tasks and the performance difference remains). My question therefore is: in what other ways can I analyze what is going on (possibly at a lower level / bytecode level)?</p> <p>EDIT: I am running the TPC-H benchmarks by the TPC (available <a href="https://issues.apache.org/jira/browse/PIG-2397" rel="nofollow">https://issues.apache.org/jira/browse/PIG-2397</a> and <a href="https://issues.apache.org/jira/browse/HIVE-600" rel="nofollow">https://issues.apache.org/jira/browse/HIVE-600</a> ). However even simpler scripts show a quite large performance difference. For example:</p> <pre><code>SELECT (dataset.age * dataset.gpa + 3) AS F1, (dataset.age/dataset.gpa - 1.5) AS F2 FROM dataset WHERE dataset.gpa &gt; 0; </code></pre> <p>I still need to fully evaluate the TPC-H benchmarks (will update later), however the results for the simpler scripts are detailed in this document: <a href="https://www.dropbox.com/s/16u3kx852nu6waw/output.pdf" rel="nofollow">https://www.dropbox.com/s/16u3kx852nu6waw/output.pdf</a></p> <p>(jpg: <a href="http://i.imgur.com/1j1rCWS.jpg" rel="nofollow">http://i.imgur.com/1j1rCWS.jpg</a> )</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload