StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POLinear time complexity ranking algorithm when the orders are precomputed
primarykey
Id
12077666
data
AcceptedAnswerId
12079996
AnswerCount
1
ClosedDate
CommentCount
3
CommunityOwnedDate
CreationDate
2012-08-22T16:40:05.037
FavoriteCount
0
LastActivityDate
2012-08-22T19:20:12.393
LastEditDate
2012-08-22T18:01:02.940
LastEditorUserId
1427219
OwnerUserId
1427219
ParentId
0
PostTypeId
1
Score
0
ViewCount
622
LastEditorDisplayName
text
Body
I am trying to write an efficient ranking algorithm in C++ but I will present my case in R as it is far easier to understand this way. <pre><code>> samples_x <- c(4, 10, 9, 2, NA, 3, 7, 1, NA, 8) > samples_y <- c(5, 7, 9, NA, 1, 4, NA, 8, 2, 10) > orders_x <- order(samples_x) > orders_y <- order(samples_y) > cbind(samples_x, orders_x, samples_y, orders_y) samples_x orders_x samples_y orders_y [1,] 4 8 5 5 [2,] 10 4 7 9 [3,] 9 6 9 6 [4,] 2 1 NA 1 [5,] NA 7 1 2 [6,] 3 10 4 8 [7,] 7 3 NA 3 [8,] 1 2 8 10 [9,] NA 5 2 4 [10,] 8 9 10 7 </code></pre> Suppose the above is already precomputed. Performing a simple ranking on each of the sample sets takes linear time complexity (the result is much like the <code>rank</code> function): <pre><code>> ranks_x <- rep(0, length(samples_x)) > for (i in 1:length(samples_x)) ranks_x[orders_x[i]] <- i </code></pre> For a work project I am working on, it would be useful for me to emulate the following behaviour in linear time complexity: <pre><code>> cc <- complete.cases(samples_x, samples_y) > ranks_x <- rank(samples_x[cc]) > ranks_y <- rank(samples_y[cc]) </code></pre> The <code>complete.cases</code> function, when given n sets of the same length, returns the indices for which none of the sets contain NAs. The <code>order</code> function returns the permutation of indices corresponding to the sorted sample set. The <code>rank</code> function returns the ranks of the sample set. How to do this? Let me know if I have provided sufficient information as to the problem in question. More specifically, I am trying to build a correlation matrix based on Spearman's rank sum correlation coefficient test in a way such that NAs are handled properly. The presence of NAs requires that the rankings be calculated for every pairwise sample set (<code>s n^2 log n</code>); I am trying to avoid that by calculating the orders once for every sample set (<code>s n log n</code>) and use a linear complexity for every pairwise comparison. Is this even doable? Thanks in advance.
Tags
<algorithm><r><correlation><missing-data>
Title
Linear time complexity ranking algorithm when the orders are precomputed
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USNicolas De Jay
UserOwnerUserId
1. USNicolas De Jay
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. This table or related slice is empty.
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.