StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POSpecific for loop too slow in R
text
Body
copied!<p>I have to use 2 data frames 2 million records and another 2 million records. I used a for loop to obtain the data from one another but it is too slow. I've created an example to demonstrate what I need to do.</p> <pre><code>ratings = data.frame(id = c(1,2,2,3,3), rating = c(1,2,3,4,5), timestamp = c("2006-11-07 15:33:57","2007-04-22 09:09:16","2010-07-16 19:47:45","2010-07-16 19:47:45","2006-10-29 04:49:05")) stats = data.frame(primeid = c(1,1,1,2), period = c(1,2,3,4), user = c(1,1,2,3), id = c(1,2,3,2), timestamp = c("2011-07-01 00:00:00","2011-07-01 00:00:00","2011-07-01 00:00:00","2011-07-01 00:00:00")) ratings$timestamp = strptime(ratings$timestamp, "%Y-%m-%d %H:%M:%S") stats$timestamp = strptime(stats$timestamp, "%Y-%m-%d %H:%M:%S") for (i in(1:nrow(stats))) { cat("Processing ",i," ...\r\n") temp = ratings[ratings$id == stats$id[i],] stats$idrating[i] = max(temp$rating[temp$timestamp < stats$timestamp[i]]) } </code></pre> <p>Can someone provide me with an alternative for this? I know apply may work but I have no idea how to translate the for function.</p> <p>UPDATE: Thank you for the help. I am providing more information.</p> <p>The table stats has unique combinations of primeid,period,user,id. The table ratings has multiple id records with different ratings and timestamps.</p> <p>What I want to do is the following. For each id found in stats, to find all the records in the ratings table (id column) and then get the max rating according to a specific timestamp obtained also from stats.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload