Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Here is a generalized function:</p> <pre><code>PatternMatcher &lt;- function(data, pattern, idx = NULL) { p &lt;- unlist(pattern[1]) if(is.null(idx)){ p &lt;- unlist(pattern[length(pattern)]) PatternMatcher(data, rev(pattern)[-1], idx = Filter(function(n) all(p %in% intersect(data[n, ], p)), 1:nrow(data))) } else if(length(pattern) &gt; 1) { PatternMatcher(data, pattern[-1], idx = Filter(function(n) all(p %in% intersect(data[n, ], p)), idx - 1)) } else Filter(function(n) all(p %in% intersect(data[n, ], p)), idx - 1) } </code></pre> <p>This is a recursive function which is reducing <code>pattern</code> in every iteration and checks only rows that go right after ones identified in the previous iteration. List structure allows passing the pattern in a convenient way:</p> <pre><code>PatternMatcher(m, list(37, list(10, 29), 42)) # [1] 57 PatternMatcher(m, list(list(45, 24, 1), 7, list(45, 31), 4)) # [1] 2 PatternMatcher(m, list(1,3)) # [1] 47 48 93 </code></pre> <p><strong>Edit:</strong> The idea of the function above seems fine: check all rows for the vector <code>pattern[[1]]</code> and get indices <code>r1</code>, then check rows <code>r1+1</code> for <code>pattern[[2]]</code> and get <code>r2</code>, etc. But it takes really much time at the first step when going through all rows. Of course, every step would take much time with e.g. <code>m &lt;- matrix(sample(1:10, 800, replace=T), ncol=8)</code>, i.e. when there is not much of a change in indices <code>r1</code>, <code>r2</code>, ... So here is another approach, here <code>PatternMatcher</code> looks very similar, but there is another function <code>matchRow</code> for finding rows that have all elements of <code>vector</code>.</p> <pre><code>matchRow &lt;- function(data, vector, idx = NULL){ if(is.null(idx)){ matchRow(data, vector[-1], as.numeric(unique(rownames(which(data == vector[1], arr.ind = TRUE))))) } else if(length(vector) &gt; 0) { matchRow(data, vector[-1], as.numeric(unique(rownames(which(data[idx, , drop = FALSE] == vector[1], arr.ind = TRUE))))) } else idx } PatternMatcher &lt;- function(data, pattern, idx = NULL) { p &lt;- pattern[[1]] if(is.null(idx)){ rownames(data) &lt;- 1:nrow(data) p &lt;- pattern[[length(pattern)]] PatternMatcher(data, rev(pattern)[-1], idx = matchRow(data, p)) } else if(length(pattern) &gt; 1) { PatternMatcher(data, pattern[-1], idx = matchRow(data, p, idx - 1)) } else matchRow(data, p, idx - 1) } </code></pre> <p>Comparison with the previous function:</p> <pre><code>library(rbenchmark) bigM &lt;- matrix(sample(1:50, 800000, replace=T), ncol=8) benchmark(PatternMatcher(bigM, list(37, c(10, 29), 42)), PatternMatcher(bigM, list(1, 3)), OldPatternMatcher(bigM, list(37, list(10, 29), 42)), OldPatternMatcher(bigM, list(1, 3)), replications = 10, columns = c("test", "elapsed")) # test elapsed # 4 OldPatternMatcher(bigM, list(1, 3)) 61.14 # 3 OldPatternMatcher(bigM, list(37, list(10, 29), 42)) 63.28 # 2 PatternMatcher(bigM, list(1, 3)) 1.58 # 1 PatternMatcher(bigM, list(37, c(10, 29), 42)) 2.02 verybigM1 &lt;- matrix(sample(1:40, 8000000, replace=T), ncol=20) verybigM2 &lt;- matrix(sample(1:140, 8000000, replace=T), ncol=20) benchmark(PatternMatcher(verybigM1, list(37, c(10, 29), 42)), PatternMatcher(verybigM2, list(37, c(10, 29), 42)), find.combo(verybigM1, convert.gui.input("37;10,29;42")), find.combo(verybigM2, convert.gui.input("37;10,29;42")), replications = 20, columns = c("test", "elapsed")) # test elapsed # 3 find.combo(verybigM1, convert.gui.input("37;10,29;42")) 17.55 # 4 find.combo(verybigM2, convert.gui.input("37;10,29;42")) 18.72 # 1 PatternMatcher(verybigM1, list(37, c(10, 29), 42)) 15.84 # 2 PatternMatcher(verybigM2, list(37, c(10, 29), 42)) 19.62 </code></pre> <p>Also now the <code>pattern</code> argument should be like <code>list(37, c(10, 29), 42)</code> instead of <code>list(37, list(10, 29), 42)</code>. And finally:</p> <pre><code>fastPattern &lt;- function(data, pattern) PatternMatcher(data, lapply(strsplit(pattern, ";")[[1]], function(i) as.numeric(unlist(strsplit(i, split = ","))))) fastPattern(m, "37;10,29;42") # [1] 57 fastPattern(m, "37;;42") # [1] 57 4 fastPattern(m, "37;;;42") # [1] 33 56 77 </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload