Note that there are some explanatory texts on larger screens.

plurals
  1. POMake Dataframe loop code run faster
    primarykey
    data
    text
    <p><strong>DATA AND REQUIREMENTS</strong></p> <p>The first table (<code>myMatrix1</code>) is from an old geological survey that used different region boundaries (<code>begin</code> and <code>finish</code>) columns to the newer survey. What I wish to do is to match the begin and finish boundaries and then create two tables one for the new data on sedimentation and one for the new data on bore width characterised as a boolean.</p> <pre><code>myMatrix1 &lt;- read.table("/path/to/file") myMatrix2 &lt;- read.table("/path/to/file") &gt; head(myMatrix1) # this is the old data sampleIDs begin finish 1 19990224 4 5 2 20000224 5 6 3 20010203 6 8 4 20019024 29 30 5 20020201 51 52 &gt; head(myMatrix2) # this is the new data begin finish sedimentation boreWidth 1 0 10 1.002455 0.014354 2 11 367 2.094351 0.056431 3 368 920 0.450275 0.154105 4 921 1414 2.250820 1.004353 5 1415 5278 0.114109 NA` </code></pre> <p>Desired output:</p> <pre><code>&gt; head(myMatrix6) sampleIDs begin finish sedimentation #myMatrix4 1 19990224 4 5 1.002455 2 20000224 5 6 1.002455 3 20010203 6 8 2.094351 4 20019024 29 30 2.094351 5 20020201 51 52 2.094351 &gt; head(myMatrix7) sampleIDs begin finish boreWidthThresh #myMatrix5 1 19990224 4 5 FALSE 2 20000224 5 6 FALSE 3 20010203 6 8 FALSE 4 20019024 29 30 FALSE 5 20020201 51 52 FALSE` </code></pre> <p><strong>CODE</strong></p> <p>The following code has taken me several hours to run on my dataset (about 5 million data points). Is there any way to change the code to make it run any faster? </p> <pre><code># create empty matrix for sedimentation myMatrix6 &lt;- data.frame(NA,NA,NA,NA)[0,] names(myMatrix6) &lt;- letters[1:4] # create empty matrix for bore myMatrix7 &lt;- data.frame(NA,NA,NA,NA)[0,] names(myMatrix7) &lt;- letters[1:4] for (i in 1:nrow(myMatrix2)) { # create matrix that has the value of myMatrix1$begin being # situated between the values of myMatrix2begin[i] and myMatrix2finish[i] myMatrix3 &lt;- myMatrix1[which((myMatrix1$begin &gt; myMatrix2$begin[i]) &amp; (myMatrix1$begin &lt; myMatrix2$finish[i])),] myMatrix4 &lt;- rep(myMatrix2$sedimentation, nrow(myMatrix3)) if (is.na(myMatrix2$boreWidth[i])) { myMatrix5 &lt;- rep(NA, nrow(myMatrix3)) } else if (myMatrix2$boreWidth[i] == 0) { myMatrix5 &lt;- rep(TRUE, nrow(myMatrix3)) } else if (myMatrix2$boreWidth[i] &gt; 0) { myMatrix5 &lt;- rep(FALSE, nrow(myMatrix3)) } myMatrix6 &lt;- rbind(myMatrix6, cbind(myMatrix3, myMatrix4)) myMatrix7 &lt;- rbind(myMatrix7, cbind(myMatrix3, myMatrix5)) } </code></pre> <p>EDIT: </p> <p><code>&gt; dput(head(myMatrix2)</code></p> <pre><code>structure(list(V1 = structure(c(6L, 1L, 2L, 4L, 5L, 3L), .Label = c("0", "11", "1415", "368", "921", "begin"), class = "factor"), V2 = structure(c(6L, 1L, 3L, 5L, 2L, 4L), .Label = c("10", "1414", "367", "5278", "920", "finish"), class = "factor"), V3 = structure(c(6L, 3L, 4L, 2L, 5L, 1L), .Label = c("0.114109", "0.450275", "1.002455", "2.094351", "2.250820", "sedimentation"), class = "factor"), V4 = structure(c(5L, 1L, 2L, 3L, 4L, 6L), .Label = c("0.014354", "0.056431", "0.154105", "1.004353", "boreWidth", "NA"), class = "factor")), .Names = c("V1", "V2", "V3", "V4"), row.names = c(NA, 6L), class = "data.frame") </code></pre> <p><code>&gt; dput(head(myMatrix1)</code></p> <pre><code>structure(list(V1 = structure(c(6L, 1L, 2L, 3L, 4L, 5L), .Label = c("19990224", "20000224", "20010203", "20019024", "20020201", "sampleIDs"), class = "factor"), V2 = structure(c(6L, 2L, 3L, 5L, 1L, 4L), .Label = c("29", "4", "5", "51", "6", "begin"), class = "factor"), V3 = structure(c(6L, 2L, 4L, 5L, 1L, 3L), .Label = c("30", "5", "52", "6", "8", "finish"), class = "factor")), .Names = c("V1", "V2", "V3" ), row.names = c(NA, 6L), class = "data.frame") </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload