Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>I would say your test scheme is not really useful. To fulfill the db query, the db server goes through several steps:</p> <ol> <li>parse the SQL</li> <li>work up a query plan, i. e. decide on which indices to use (if any), optimize etc.</li> <li>if an index is used, search it for the pointers to the actual data, then go to the appropriate location in the data or</li> <li>if no index is used, scan <i>the whole table</i> to determine which rows are needed</li> <li>load the data from disk into a temporary location (hopefully, but not necessarily, memory)</li> <li>perform the count() and avg() calculations</li> </ol> <p>So, creating an array in Python and getting the average basically skips all these steps save the last one. As disk I/O is among the most expensive operations a program has to perform, this is a major flaw in the test (see also the answers to <a href="https://stackoverflow.com/questions/26021/how-is-data-compression-more-effective-than-indexing-for-search-performance">this question</a> I asked here before). Even if you read the data from disk in your other test, the process is completely different and it's hard to tell how relevant the results are.</p> <p>To obtain more information about where Postgres spends its time, I would suggest the following tests:</p> <ul> <li>Compare the execution time of your query to a SELECT without the aggregating functions (i. e. cut step 5)</li> <li>If you find that the aggregation leads to a significant slowdown, try if Python does it faster, obtaining the raw data through the plain SELECT from the comparison.</li> </ul> <p>To speed up your query, reduce disk access first. I doubt very much that it's the aggregation that takes the time.</p> <p>There's several ways to do that:</p> <ul> <li>Cache data (in memory!) for subsequent access, either via the db engine's own capabilities or with tools like memcached</li> <li>Reduce the size of your stored data</li> <li>Optimize the use of indices. Sometimes this can mean to skip index use altogether (after all, it's disk access, too). For MySQL, I seem to remember that it's recommended to skip indices if you assume that the query fetches more than 10% of all the data in the table.</li> <li>If your query makes good use of indices, I know that for MySQL databases it helps to put indices and data on separate physical disks. However, I don't know whether that's applicable for Postgres.</li> <li>There also might be more sophisticated problems such as swapping rows to disk if for some reason the result set can't be completely processed in memory. But I would leave that kind of research until I run into serious performance problems that I can't find another way to fix, as it requires knowledge about a lot of little under-the-hood details in your process.</li> </ul> <p><b>Update:</b></p> <p><i>I just realized that you seem to have no use for indices for the above query and most likely aren't using any, too, so my advice on indices probably wasn't helpful. Sorry. Still, I'd say that the aggregation is not the problem but disk access is. I'll leave the index stuff in, anyway, it might still have some use.</i></p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload