Note that there are some explanatory texts on larger screens.

plurals
  1. POSQL: Select transactions where rows are not of criteria inside the same table
    text
    copied!<p>I have a table with transactions:</p> <pre><code>Transactions ------------ id | account | type | date_time | amount ---------------------------------------------------- 1 | 001 | 'R' | '2012-01-01 10:01:00' | 1000 2 | 003 | 'R' | '2012-01-02 12:53:10' | 1500 3 | 003 | 'A' | '2012-01-03 13:10:01' | -1500 4 | 002 | 'R' | '2012-01-03 17:56:00' | 2000 5 | 001 | 'R' | '2012-01-04 12:30:01' | 1000 6 | 002 | 'A' | '2012-01-04 13:23:01' | -2000 7 | 003 | 'R' | '2012-01-04 15:13:10' | 3000 8 | 003 | 'R' | '2012-01-05 12:12:00' | 1250 9 | 003 | 'A' | '2012-01-06 17:24:01' | -1250 </code></pre> <p>and I wish to select all of certain type ('R'), but not those that immediatly (in order of the date_time field) have another transaction of another type ('A') for the same account filed...</p> <p>So, the query should throw the following rows, given the previous example:</p> <pre><code>id | account |type | date | amount ---------------------------------------------------- 1 | 001 | 'R' | '2012-01-01 10:01:00' | 1000 5 | 001 | 'R' | '2012-01-04 12:30:01' | 1000 7 | 003 | 'R' | '2012-01-04 15:13:10' | 3000 </code></pre> <p>(As you can see, row 2 isn't displayed because row 3 'cancels' it... also row 4 is 'cancelled' by row 6'; Row 7 do appears (even though the account 003 belongs to cancelled row #2, this time in row 7 it's not cancelled by any 'A' row); And row 8 won't appear (it's too for 003 account since now this one is cancelled by 9, which doesn't cancels 7 too, just the previouse one: 8...</p> <p>I have tried Joins, subqueries in Where clauses but I'm really not sure how do I must make my query...</p> <p>What I have tried:</p> <p>Trying joins:</p> <pre><code> SELECT trans.type as type, trans.amount as amount, trans.date_time as dt, trans.account as acct, FROM Transactions trans INNER JOIN ( SELECT t.type AS type, t.acct AS acct, t.date_time AS date_time FROM Transactions t WHERE t.date_time &gt; trans.date_time ORDER BY t.date_time DESC ) AS nextTrans ON nextTrans.acct = trans.acct WHERE trans.type IN ('R') AND nextTrans.type NOT IN ('A') ORDER BY DATE(trans.date_time) ASC </code></pre> <p>This throws an error, since I can't introduce external values to the JOIN in MySQL.</p> <p>Trying subquery in where:</p> <pre><code> SELECT trans.type as type, trans.amount as amount, trans.date_time as dt, trans.account as acct, FROM Transactions trans WHERE trans.type IN ('R') AND trans.datetime &lt; ( SELECT t.date_time AS date_time FROM Transactions t WHERE t.account = trans.account ORDER BY t.date_time DESC ) AS nextTrans ON nextTrans.acct = trans.acct ORDER BY DATE(trans.date_time) ASC </code></pre> <p>This is wrong, I can get to introduce external values to the WHERE in MySQL but I cannot manage to find the way to filter correctly for what I need...</p> <p>IMPORTANT EDIT:</p> <p>I managed to achieve a solution, but it now needs serious optimization. Here it is:</p> <pre><code>SELECT * FROM (SELECT t1.*, tFlagged.id AS cancId, tFlagged.type AS cancFlag FROM transactions t1 LEFT JOIN (SELECT t2.* FROM transactions t2 ORDER BY t2.date_time ASC ) tFlagged ON (t1.account=tFlagged.account AND t1.date_time &lt; tFlagged.date_time) WHERE t1.type = 'R' GROUP BY t1.id) tCanc WHERE tCanc.cancFlag IS NULL OR tCanc.cancFlag &lt;&gt; 'A' </code></pre> <p>I joined the table with itself, just considering same account and great date_time. The Join goes ordered by date_time. Grouping by id I managed to get only the first result of the join, which happens to be the next transaction for the same account.</p> <p>Then on the outer select, I filter out those that have an 'A', since that means that the next transaction was effectively a cancelation for it. In other words, if there is no next transaction for the same account or if the next transaction is an 'R', then it is not cancelled and it must be shown in the result...</p> <p>I got this:</p> <pre><code>+----+---------+------+---------------------+--------+--------+----------+ | id | account | type | date_time | amount | cancId | cancFlag | +----+---------+------+---------------------+--------+--------+----------+ | 1 | 001 | R | 2012-01-01 10:01:00 | 1000 | 5 | R | | 5 | 001 | R | 2012-01-04 12:30:01 | 1000 | NULL | NULL | | 7 | 003 | R | 2012-01-04 15:13:10 | 3000 | 8 | R | +----+---------+------+---------------------+--------+--------+----------+ </code></pre> <p>It relates each transaction with the next one in time for the same account and then filters out those that have been cancelled... Success!!</p> <p>As I said, the problem now is optimization. My real data has a lot of rows (as a table holding transactions through time is expected to have), and for a table of ~10,000 rows right now, I got a positive result with this query in 1min.44sec. I suppose that's the thing with joins... (For those who know the protocol in here, what should I do? launch a new question here and post this as a solution to this one? Or just wait for more answers here?)</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload