Note that there are some explanatory texts on larger screens.

plurals
  1. POLinear time complexity ranking algorithm when the orders are precomputed
    primarykey
    data
    text
    <p>I am trying to write an efficient ranking algorithm in C++ but I will present my case in R as it is far easier to understand this way.</p> <pre><code>&gt; samples_x &lt;- c(4, 10, 9, 2, NA, 3, 7, 1, NA, 8) &gt; samples_y &lt;- c(5, 7, 9, NA, 1, 4, NA, 8, 2, 10) &gt; orders_x &lt;- order(samples_x) &gt; orders_y &lt;- order(samples_y) &gt; cbind(samples_x, orders_x, samples_y, orders_y) samples_x orders_x samples_y orders_y [1,] 4 8 5 5 [2,] 10 4 7 9 [3,] 9 6 9 6 [4,] 2 1 NA 1 [5,] NA 7 1 2 [6,] 3 10 4 8 [7,] 7 3 NA 3 [8,] 1 2 8 10 [9,] NA 5 2 4 [10,] 8 9 10 7 </code></pre> <p>Suppose the above is already precomputed. Performing a simple ranking on each of the sample sets takes linear time complexity (the result is much like the <code>rank</code> function):</p> <pre><code>&gt; ranks_x &lt;- rep(0, length(samples_x)) &gt; for (i in 1:length(samples_x)) ranks_x[orders_x[i]] &lt;- i </code></pre> <p>For a work project I am working on, it would be useful for me to emulate the following behaviour in linear time complexity:</p> <pre><code>&gt; cc &lt;- complete.cases(samples_x, samples_y) &gt; ranks_x &lt;- rank(samples_x[cc]) &gt; ranks_y &lt;- rank(samples_y[cc]) </code></pre> <p>The <code>complete.cases</code> function, when given n sets of the same length, returns the indices for which none of the sets contain NAs. The <code>order</code> function returns the permutation of indices corresponding to the sorted sample set. The <code>rank</code> function returns the ranks of the sample set.</p> <p>How to do this? Let me know if I have provided sufficient information as to the problem in question.</p> <p>More specifically, I am trying to build a correlation matrix based on Spearman's rank sum correlation coefficient test in a way such that NAs are handled properly. The presence of NAs requires that the rankings be calculated for every pairwise sample set (<code>s n^2 log n</code>); I am trying to avoid that by calculating the orders once for every sample set (<code>s n log n</code>) and use a linear complexity for every pairwise comparison. Is this even doable?</p> <p>Thanks in advance.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload