Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>I'm not clear on your goal, but if you're trying to read all of these files into a single R data structure, then I see two major performance concerns:</p> <ol> <li>File access times - from the moment you request read.csv, a myriad of complex processes start on your machine involving seeing if that file exists, finding the location of that file in memory or on disk (and reading the data into memory, if need be), then interpreting the data within R. I would expect that this would be a nearly-constant slowdown as you read in millions of files.</li> <li>Growing your single data structure with each new file read. Every time you want to add a few rows to your matrix, you'll likely be needing to reallocate a similarly sized chunk of memory in order to store the larger matrix. If you're growing your array 15 million times, you'll certainly notice a performance slow-down here. With this problem, the performance will get progressively worse as your read in more files.</li> </ol> <p>So do some quick profiling and see how long the reads are taking. If they're slowing down progressively as you read in more files, then let's focus on problem #2. If it's constantly slow, then let's worry about problem #1.</p> <p>Regarding solutions, I'd say you could start with two things:</p> <ol> <li>Combine the CSV files in another programming language. A simple shell script would likely do the job for you if you're just looping through files and concatenating them into a single large file. As Joshua and Richie mention below, you may be able to optimize this without having to deviate to another language by using the more efficient <code>scan()</code> or <code>readlines()</code> functions.</li> <li>Pre-size your unified data structure. If you're using a matrix, for instance, set the number of rows to ~ 15 million x 100. That will ensure that you only have to find room in memory for this object once, and the rest of the operations will just insert data into the pre-sized matrix.</li> </ol> <p>Add some more details of your code (what does the list look like that you're using?) and we may be able to be more helpful.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload