StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>A very unorthodox way of approaching this would be to implement the matrix as a <em>k</em>d-tree, with <em>k</em>=2. You would delete a row or column by visiting all cells that intersect that row or column; if the matrix is square and it has <code>n</code> nonzero entries, the average number of cells you'll need to examine is <code>sqrt(n)</code>, I believe. (I have written a proof to that effect in some answer here on Stackoverflow - I can look it up if you need it.)</p> <p>In a very similar vein, you could use a quadtree; the way I understand these terms, the difference is that the cell boundaries are pre-defined to always cut the x- and y-range into halves (for four identical subcells in every non-leaf) in a quadtree, whereas the nodes determine the boundaries in a <em>k</em>d-tree (for two non-identical subcells in every non-leaf).</p> <p>I think for both versions, the performance of this solution depends on the sparsity in a complicated way. First of all, if the data is truly very sparse, as in, the average number of nonzero entries per row/column is much less than one, then this solution will be much more memory-efficient than the one you propose, but probably less time-efficient. If the number of nonzero entries is some constant fraction <code>c</code> of the total number of entries, and your matrix is <code>m * k</code>, this solution might be more efficient: for your solution, to delete one column you would need to change, on average, <code>c*m</code> row lists. For each such row list, you would need to find the spot where your column is and move all entries coming after it, one forward. That is on average <code>c*k/2 = O(k)</code> entries in each row, for a total of <code>c^2*m*k/2 = O(m*k)</code> operations. We would have <code>n = c*m*k</code>, so the average total number of operations would be <code>c*n/2 = O(n)</code> whereas it would be <code>O(sqrt(n))</code> for the solutions proposed here. Similarly, if you assume the matrix is roughly square, say <code>m*m</code>, and each row/column has <code>f(m)</code> nonzero entries on average (so that <code>n = f(m)*m</code>), then the number of operations is <code>O(f(m)^2)</code> for your solution and <code>O(sqrt(m*f(m)))</code> for this one; that means, this solution is better if <code>f(m) = ω(m^(1/3))</code>. (Note that was a <a href="http://en.wikipedia.org/wiki/Big_oh_notation#Family_of_Bachmann.E2.80.93Landau_notations" rel="nofollow">lower case omega</a>; it basically means, <code>f(m)</code> grows asymptotically faster than <code>m^(1/3)</code>, for example like <code>sqrt(m)</code> or <code>c*m</code>.)</p> <p>(I was assuming that each entry of the <code>rows</code> map is an array in your solution; a linked list would give the same complexity because it takes linear time to find the right column in the list. You could do a little better by having each row and column represented, not by an array, but by a self-balancing tree -- then you could get away with <code>O(log(k))</code> operations for each row and <code>O(m * log(k)) = O(sqrt(n)*log(n))</code> in total, assuming the matrix is not too far from being square. Still not better than this tree business, but if you really need the best performance it might be worth implementing to see how it works in practice.)</p> <p>If the density of your matrix is indeed a constant <code>c</code>, then a dense matrix representation would also do <code>O(sqrt(n))</code> operations, so the asymptotic behaviour should be the same. The constant factors would depend on <code>c</code>, so once again, you'd need to implement both in order to be certain which is faster.</p> <p>For the quadtree solution to have good performance, you would additionally need the nonzero values to not be concentrated in a small area; the distribution doesn't need to be particularly even, just not extremely concentrated.</p> <p>If you also expect to do frequent addition and removal of arbitrary entries, the <em>k</em>d-tree is very tricky to do well - I don't think there are easy schemes to make the tree balance itself, like red-black or AVL or similar 1-d trees. A quadtree could still work.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload