Note that there are some explanatory texts on larger screens.

plurals
  1. POR combining FF base and Sqldf
    primarykey
    data
    text
    <p>Up until this point I've been using a combination of Sqldf and R functions to manage my datasets. However, I need to do a bunch of left-joins on large datasets and I start to run out of memory even using sqldf dbname=tempfile().</p> <p>The first thing I noticed with FF is that I cannot pass it to sqldf. The second thing I noticed is that my typical functions do not all work in ff. </p> <p>Example of my normal joining</p> <pre><code>base &lt;- read.csv(filename) base &lt;- sqldf("select * from base where DATE &gt; 20120101") for (j in list.files()){ temp &lt;- read.csv(tempfile) temp &lt;- sqldf("select MATCH_KEY, DATE from temp") base &lt;- sqldf("select * from base NATURAL LEFT OUTER JOIN temp") } </code></pre> <p>with ffbase I could not simply use "as.ffdf(temp)." The work around was to write a physical temp file, then read it in as ff, then do a merge with ff-s. I feel this is not such a great way to work with ff. Any better alternatives?</p> <p>The second problem I'm facing is probably due to how unfamiliar I am with ff. I have a simple code I just don't know how to implement in ff. Basically I have the data frame base, which I want to loop over and count the number of times the value is greater than a certain number. An idea using my dates example (in reality I'm also checking numbers, ratios, etc, but the idea is always the same).</p> <pre><code>checks &lt;- c(20010101,20020101,20030101) summary &lt;- matrix(0,ncol=dim(base)[2],nrow=length(checks)) for (i in checks){ for (j in dim(base)[2]){ summary[i,j]&lt;-sum(base[,j]&gt;=i) }} </code></pre> <p>These functions wouldn't work with ff either. Right now I am in fact reading in the files using sqldf, then writing to a temporary file. Reading those in with ff, then doing all the merging business. Then, I'm once again writing out to a temporary file, and reading it back in as a normal file. Ouf! Any advice on improvements?</p> <p>[EDIT]</p> <p>A big question is, how to convert a table created via sqldf (temp &lt;-sqldf(stuff)) using as.ffdf. I'm getting an error : "Error in ff(initdata = initdata, length = length, levels = levels, ordered = ordered, : vmode 'character' not implemented"</p> <p>Also, two examples of functions I can't seem to get working in ff.</p> <p>1) I often replace missing values in a file with 0 to distinguish them with missings created from a merge. I do this by</p> <pre><code> DF[is.na(DF)] &lt;- 0 </code></pre> <p>with ff it seems a bit more involved, and I worry about losing readability: <a href="https://stackoverflow.com/questions/13510838/replace-nas-in-a-ffdf-object">Replace NAs in a ffdf object</a></p> <p>2) Taking the sum over a column or row, looking for specific values. Example, count the number of times "R" appears in a column. In ff?</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload