StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
15053711
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
1
CommunityOwnedDate
CreationDate
2013-02-24T16:17:12.140
FavoriteCount
0
LastActivityDate
2013-02-25T00:46:16.823
LastEditDate
2013-02-25T00:46:16.823
LastEditorUserId
189946
OwnerUserId
189946
ParentId
15052578
PostTypeId
2
Score
6
ViewCount
0
LastEditorDisplayName
text
Body
I don't have a full answer to why the times for lapply are not linear, but notice that you are using lapply as a glorified loop by applying over a list of variable names (and then using these names to index the actual values) rather than applying over a list of values. I'm not sure how to solve this using lapply, but I can offer an alternative that is (roughly) a liner function of the number of variables. Start by creating the example data: <pre><code>Genedata <- structure(list(time = c(120, 120, 28.96, 119.21, 59.53, 68.81, 82.29, 110.82, 65.88, 84.13, 16.47, 89.75, 76.07, 67.82, 52.24, 64.1, 55.13, 57.79, 50.1, 43.79, 30.27, 3.64, 36.59, 20.02, 118.58, 55.33, 120, 120, 120, 106.94, 11.7, 28.79, 26.82, 52.5, 24.03, 88.99, 102.44, 33.73, 85.28, 26.53, 37.31, 86.43, 55.92, 70.65, 76.04, 79.13, 83.99, 80.25, 40.6, 95.37, 31.06, 37.31, 22.02, 63.25, 34.09, 52.14, 66.04, 59.96, 47.86, 58.45, 45.99, 60.42, 37.67, 15.71, 39.25, 49.87, 25.24, 44.97, 35.9, 26.66, 36.78, 41.52, 22.22, 33.2, 39.68, 25.61, 83.99, 15.05, 8.38, 18.18, 27.15, 24.26, 105.17, 11.76, 70.45, 95.07, 112.33, 27.51, 45.92, 54.04, 103.98, 6.11, 99.51, 20.44, 76.6, 10.02, 41.45, 96.26, 85.61, 78.87, 22.25, 74.53, 59.07, 47.8, 5.68, 79.39, 74.07, 50.76, 67.82, 70.84, 50.59, 34.58, 38.72, 54.9, 67.92, 58.45, 59.34, 54.54, 19.03, 26.26, 52.86, 32.05, 55.95, 56.67, 50.43, 4.24, 49.11, 49.57, 50.49, 63.05, 49.24, 0.92, 31.36, 40.3, 116.64, 31.92, 19.98, 24.62, 18.54, 120, 17.62, 5.32, 2.36, 5.72, 15.28, 95.07, 4.96, 28.89, 3.25, 0.92, 18.77, 57.73, 14.39, 5.12, 4.99, 33.17, 50.53, 5.72, 8.02, 34.02, 1.44, 36.95, 60.75, 37.44, 23.07, 19.85, 31.85, 8.61, 42.27, 15.25, 14.56, 9.96, 8.94, 32.67, 2.07, 22.78, 22.35), cens = c(0L, 0L, NA, 0L, 0L, 1L, 0L, 0L, NA, 0L, NA, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, NA, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, NA, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, NA, 1L, 0L, 1L, 1L, 0L, 0L, 1L, 1L, 1L, 0L, 1L, 0L, NA, 1L, 1L, 1L, 0L, 0L, 0L, NA, 0L, NA, NA, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, NA, NA, 0L, 0L, 0L, 0L, 1L, NA, 0L, 1L, 0L, 0L, 0L, NA, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, NA, 1L, 1L, 1L, 1L, NA, 0L, NA, 1L, 1L, 0L, 1L, 0L, 1L, 1L, NA, 1L, 0L, 1L, 1L, 0L)), .Names = c("time", "cens"), class = "data.frame", row.names = c(NA, -177L)) data <- replicate(10000, abs(rnorm(177))) Genedata <- data.frame(Genedata, data) </code></pre> My approach is the reshape the data using the <code>melt</code> function in the reshape2 package, then use <code>dlply</code> from the plyr package to run the model on each subset. <pre><code>ibrary(maxstat) library(plyr) library(reshape2) system.time({ dat.m <- melt(Genedata, id.vars=1:3) maxstat.estimate <- dlply(dat.m, .(variable), function(DF) { maxstat.test(Surv(time, cens) ~ value, smethod="LogRank", pmethod="Lau94", data=DF)})}) </code></pre> Some timings from this approach are as follows: <pre><code>## 1000 variables ## user system elapsed ## 11.613 0.090 11.780 ## 2000 variables ## user system elapsed ## 23.847 0.060 24.043 ## 3000 variables ## user system elapsed ## 35.316 0.093 35.905 ## 10000 variables ## user system elapsed ## 120.654 0.594 122.140 </code></pre> showing that system time is roughly a linear function of the number of variables. The full 43000 variables takes just over 10 minutes, just enough time for a cup of coffee: <pre><code>## 43000 ## user system elapsed ## 612.587 3.134 630.464 </code></pre>
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POLapply slows with increasing loops
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USIsta
UserOwnerUserId
1. USIsta
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. POLapply slows with increasing loops
 singulars
 PostTypePostTypeId
 PTQuestion
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId
1. COOk problem solved :) cheers. It works well, surprising considering lapply is usually suggested for large datasets.
 singulars
 PostPostId
 PO
 UserUserId
 USPyPer User

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.