Note that there are some explanatory texts on larger screens.

plurals
  1. POBreaking a list of character strings into partitions
    primarykey
    data
    text
    <p>Here is my problem. I have a dataset with 200k rows.</p> <ul> <li>Each row corresponds to a test conducted on a subject. </li> <li>Subjects have unequal number of tests.</li> <li>Each test is dated.</li> </ul> <p>I want to assign an index to each test. E.g. The first test of subject 1 would be 1, the second test of subject 1 would be 2. The first test of subject 2 would be 1 etc..</p> <p>My strategy is to get a list of unique Subject IDs, use lapply to subset the dataset into a list of dataframes using the unique Subject IDs, with each Subject having his/her own dataframe with the tests conducted. Ideally I would then be able to sort each dataframe of each subject and assign an index for each test.</p> <p>However, doing this over a 200k x 32 dataframe made my laptop (i5, Sandy Bridge, 4GB ram) run out of memory quite quickly.</p> <p>I have 2 questions:</p> <ol> <li>Is there a better way to do this?</li> <li>If there is not, my only option to overcome the memory limit is to break my unique SubjectID list into smaller sets like 1000 SubjectIDs per list, lapply it through the dataset and at the end of everything, join the lists together. Then, how do I create a function to break my SubjectID list by supplying say an integer that denotes the number of partitions. e.g. BreakPartition(Dataset, 5) will break the dataset into 5 partitions equally.</li> </ol> <p>Here is code to generate some dummy data:</p> <pre><code>UniqueSubjectID &lt;- sapply(1:500, function(i) paste(letters[sample(1:26, 5, replace = TRUE)], collapse ="")) UniqueSubjectID &lt;- subset(UniqueSubjectID, !duplicated(UniqueSubjectID)) Dataset &lt;- data.frame(SubID = sample(sapply(1:500, function(i) paste(letters[sample(1:26, 5, replace = TRUE)], collapse ="")),5000, replace = TRUE)) Dates &lt;- sample(c(dates = format(seq(ISOdate(2010,1,1), by='day', length=365), format='%d.%m.%Y')), 5000, replace = TRUE) Dataset &lt;- cbind(Dataset, Dates) </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload