Note that there are some explanatory texts on larger screens.

plurals
  1. POR: relative frequency in r by factor
    text
    copied!<p>I would like to get a table of top 10 absolute and relative frequencies for a variable across other factor variable. I have a dataframe with 3 columns: 1 column is a factor variable, 2nd is other variable I need to count, 3 is logical variable as a constraint. (real database has more than 4mln observations)</p> <pre><code>dtf&lt;-data.frame(c("a","a","b","c","b"),c("aaa","bbb","aaa","aaa","bbb"),c(TRUE,FALSE,TRUE,TRUE,TRUE)) colnames(dtf)&lt;-c("factor","var","log") dtf factor var log 1 a aaa TRUE 2 a bbb FALSE 3 b aaa TRUE 4 c aaa TRUE 5 b bbb TRUE </code></pre> <p>So I need to find top absolute and relative frequencies of "var" where "log"==TRUE across each factor of "factor".</p> <p>I've tried this with absolute frequencies (in real db I extract top 10, here I get 2 lines):</p> <pre><code>t1&lt;-tapply(dtf$var[dtf$log==T],dtf$factor[dtf$log==T],function(x)(head(sort(table(x),decreasing=T),n=2L))) # Returns array of lists: list of factors containing list of top frequencies t2&lt;-(t1, ldply) # Split list inside by id and freq t3&lt;-do.call(rbind, lapply(t2, data.frame)) # Returns dataframe of top "var" values and corresponding freq for each group in "factor" # Factor variable's labels are saved as row.names in t3 </code></pre> <p>The following function helps to find relative frequency as for the whole database, not grouped by factors:</p> <pre><code>getrelfreq&lt;-function(x){ v&lt;-table(x) v_rel&lt;-v/nrow(dtf[dtf$log==T,]) head(sort(v_rel,decreasing=T),n=2L)} </code></pre> <p>But I have problems with relative frequencies as I need to divide the absolute frequency by number of rows of "var" BY EACH factor, not TOTAL nrow of "var" where "log"==T. I don't know how to use that in tapply loop such that the denominator will be different for each factor. I also would like to use both functions in 1 tapply loop instead of generating many tables and merging results. But have no idea how to put such 2 functions together.</p> <p>Help please :)</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload