Note that there are some explanatory texts on larger screens.

plurals
  1. POMySQL covariance computation on single table
    text
    copied!<p>I have a single table MySQL database of financial transactions with the following schema:</p> <pre><code>+-----------------+---------------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-----------------+---------------------+------+-----+---------+-------+ | symbol_id | tinyint(3) unsigned | YES | MUL | NULL | | | timestamp | timestamp(6) | YES | MUL | NULL | | | buy_sell | char(1) | YES | | NULL | | | price | decimal(10,6) | YES | MUL | NULL | | +-----------------+---------------------+------+-----+---------+-------+ </code></pre> <p>There are 200 unique <code>symbol_id</code>s. Ultimately I want to be able to compute the running (over time) covariance of the price of all of these pairs. Initially I can settle for just computing the covariance of a single pair and then I can iterate. </p> <p>In order to compute a covariance I need two arrays of equal length (in this case <code>price</code>). I'm struggling with how to write this as a single query, and avoiding having <em>all</em> of the records being returned for me to locally compute the covariance.</p> <p>Here is what I'm trying to accomplish in two <strong>pseudo</strong>-SQL queries:</p> <pre><code>SELECT (AVG(price1*price2) - AVG(price1)*AVG(price2)) as covar FROM data </code></pre> <p>and</p> <pre><code>SELECT price AS price1 WHERE HOUR(timestamp)=1 AND symbol_id=1 LIMIT(MIN(COUNT(price1,price2))) SELECT price AS price2 WHERE HOUR(timestamp)=1 AND symbol_id=2 LIMIT(MIN(COUNT(price1,price2))) </code></pre> <p>The <em>first</em> statement takes two equal length arrays of <code>price1</code> and <code>price2</code> and computes the covariance. The <em>second</em> statement is what is selecting two different types, that all take place within the 1st hour of transactions, and limits the returned values to be equal length.</p> <p>In my limited SQL knowledge, I'm having trouble understanding how I would combine these queries. Any help is much appreciated. <em>Ultimately being able to run one query that computes the pair-wise covariance for a specific block of time would be lovely.</em> </p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload