Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>Here is a solution based on nested subqueries. First, I added a few rows to catch a few more cases. Transaction 10, for example, should not be cancelled by transaction 12, because transaction 11 comes in between.</p> <pre><code>&gt; select * from transactions order by date_time; +----+---------+------+---------------------+--------+ | id | account | type | date_time | amount | +----+---------+------+---------------------+--------+ | 1 | 1 | R | 2012-01-01 10:01:00 | 1000 | | 2 | 3 | R | 2012-01-02 12:53:10 | 1500 | | 3 | 3 | A | 2012-01-03 13:10:01 | -1500 | | 4 | 2 | R | 2012-01-03 17:56:00 | 2000 | | 5 | 1 | R | 2012-01-04 12:30:01 | 1000 | | 6 | 2 | A | 2012-01-04 13:23:01 | -2000 | | 7 | 3 | R | 2012-01-04 15:13:10 | 3000 | | 8 | 3 | R | 2012-01-05 12:12:00 | 1250 | | 9 | 3 | A | 2012-01-06 17:24:01 | -1250 | | 10 | 3 | R | 2012-01-07 00:00:00 | 1250 | | 11 | 3 | R | 2012-01-07 05:00:00 | 4000 | | 12 | 3 | A | 2012-01-08 00:00:00 | -1250 | | 14 | 2 | R | 2012-01-09 00:00:00 | 2000 | | 13 | 3 | A | 2012-01-10 00:00:00 | -1500 | | 15 | 2 | A | 2012-01-11 04:00:00 | -2000 | | 16 | 2 | R | 2012-01-12 00:00:00 | 5000 | +----+---------+------+---------------------+--------+ 16 rows in set (0.00 sec) </code></pre> <p>First, create a query to grab, for each transaction, "the date of the most recent transaction before that one in the same account":</p> <pre><code>SELECT t2.*, MAX(t1.date_time) AS prev_date FROM transactions t1 JOIN transactions t2 ON (t1.account = t2.account AND t2.date_time &gt; t1.date_time) GROUP BY t2.account,t2.date_time ORDER BY t2.date_time; +----+---------+------+---------------------+--------+---------------------+ | id | account | type | date_time | amount | prev_date | +----+---------+------+---------------------+--------+---------------------+ | 3 | 3 | A | 2012-01-03 13:10:01 | -1500 | 2012-01-02 12:53:10 | | 5 | 1 | R | 2012-01-04 12:30:01 | 1000 | 2012-01-01 10:01:00 | | 6 | 2 | A | 2012-01-04 13:23:01 | -2000 | 2012-01-03 17:56:00 | | 7 | 3 | R | 2012-01-04 15:13:10 | 3000 | 2012-01-03 13:10:01 | | 8 | 3 | R | 2012-01-05 12:12:00 | 1250 | 2012-01-04 15:13:10 | | 9 | 3 | A | 2012-01-06 17:24:01 | -1250 | 2012-01-05 12:12:00 | | 10 | 3 | R | 2012-01-07 00:00:00 | 1250 | 2012-01-06 17:24:01 | | 11 | 3 | R | 2012-01-07 05:00:00 | 4000 | 2012-01-07 00:00:00 | | 12 | 3 | A | 2012-01-08 00:00:00 | -1250 | 2012-01-07 05:00:00 | | 14 | 2 | R | 2012-01-09 00:00:00 | 2000 | 2012-01-04 13:23:01 | | 13 | 3 | A | 2012-01-10 00:00:00 | -1500 | 2012-01-08 00:00:00 | | 15 | 2 | A | 2012-01-11 04:00:00 | -2000 | 2012-01-09 00:00:00 | | 16 | 2 | R | 2012-01-12 00:00:00 | 5000 | 2012-01-11 04:00:00 | +----+---------+------+---------------------+--------+---------------------+ 13 rows in set (0.00 sec) </code></pre> <p>Use that as a subquery to get each transaction and its predecessor on the same row. Use some filtering to pull out the transactions we're interested in - namely, 'A' transactions whose predecessors are 'R' transactions that they exactly cancel out -</p> <pre><code>SELECT t3.*,transactions.* FROM transactions JOIN (SELECT t2.*, MAX(t1.date_time) AS prev_date FROM transactions t1 JOIN transactions t2 ON (t1.account = t2.account AND t2.date_time &gt; t1.date_time) GROUP BY t2.account,t2.date_time) t3 ON t3.account = transactions.account AND t3.prev_date = transactions.date_time AND t3.type='A' AND transactions.type='R' AND t3.amount + transactions.amount = 0 ORDER BY t3.date_time; +----+---------+------+---------------------+--------+---------------------+----+---------+------+---------------------+--------+ | id | account | type | date_time | amount | prev_date | id | account | type | date_time | amount | +----+---------+------+---------------------+--------+---------------------+----+---------+------+---------------------+--------+ | 3 | 3 | A | 2012-01-03 13:10:01 | -1500 | 2012-01-02 12:53:10 | 2 | 3 | R | 2012-01-02 12:53:10 | 1500 | | 6 | 2 | A | 2012-01-04 13:23:01 | -2000 | 2012-01-03 17:56:00 | 4 | 2 | R | 2012-01-03 17:56:00 | 2000 | | 9 | 3 | A | 2012-01-06 17:24:01 | -1250 | 2012-01-05 12:12:00 | 8 | 3 | R | 2012-01-05 12:12:00 | 1250 | | 15 | 2 | A | 2012-01-11 04:00:00 | -2000 | 2012-01-09 00:00:00 | 14 | 2 | R | 2012-01-09 00:00:00 | 2000 | +----+---------+------+---------------------+--------+---------------------+----+---------+------+---------------------+--------+ 4 rows in set (0.00 sec) </code></pre> <p>From the result above it's apparent we're almost there - we've identified the unwanted transactions. Using <code>LEFT JOIN</code> we can filter these out of the whole transaction set:</p> <pre><code>SELECT transactions.* FROM transactions LEFT JOIN (SELECT transactions.id FROM transactions JOIN (SELECT t2.*, MAX(t1.date_time) AS prev_date FROM transactions t1 JOIN transactions t2 ON (t1.account = t2.account AND t2.date_time &gt; t1.date_time) GROUP BY t2.account,t2.date_time) t3 ON t3.account = transactions.account AND t3.prev_date = transactions.date_time AND t3.type='A' AND transactions.type='R' AND t3.amount + transactions.amount = 0) t4 USING(id) WHERE t4.id IS NULL AND transactions.type = 'R' ORDER BY transactions.date_time; +----+---------+------+---------------------+--------+ | id | account | type | date_time | amount | +----+---------+------+---------------------+--------+ | 1 | 1 | R | 2012-01-01 10:01:00 | 1000 | | 5 | 1 | R | 2012-01-04 12:30:01 | 1000 | | 7 | 3 | R | 2012-01-04 15:13:10 | 3000 | | 10 | 3 | R | 2012-01-07 00:00:00 | 1250 | | 11 | 3 | R | 2012-01-07 05:00:00 | 4000 | | 16 | 2 | R | 2012-01-12 00:00:00 | 5000 | +----+---------+------+---------------------+--------+ </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload