Note that there are some explanatory texts on larger screens.

plurals
  1. POReturn original search terms for grep in R
    primarykey
    data
    text
    <p>I have a list of items and a list of search terms, and I am trying to do two things:</p> <ol> <li>Search through the items for matches to any of the search terms, and return true iff a match is found.</li> <li>For all items where true is returned (i.e., there was a match), I would like to also return the original search term which was matched in step 1.</li> </ol> <p>So, given the following data frame:</p> <pre><code> items 1 alex 2 alex is a person 3 this is a test 4 false 5 this is cathy </code></pre> <p>and the following list of search terms:</p> <pre><code>"alex" "bob" "cathy" "derrick" "erica" "ferdinand" </code></pre> <p>I would like to create the following output:</p> <pre><code> items matches original 1 alex TRUE alex 2 alex is a person TRUE alex 3 this is a test FALSE &lt;NA&gt; 4 false FALSE &lt;NA&gt; 5 this is cathy TRUE cathy </code></pre> <p>Step 1 is fairly straightforward, but I am having trouble with step (2). To create the 'matches' column, I use <code>grepl()</code> to create a variable which is <code>TRUE</code> if a row in <code>d$items</code> is in the list of search terms, and <code>FALSE</code> otherwise. </p> <p>For step 2, my thought was that I should be able to just use <code>grep()</code> while specifying <code>value = T</code>, as shown in my code below. However, this returns the wrong value: rather than return the original search term which was matched by grep, it returns the value of the item that was matched. So I get the following output:</p> <pre><code> items matches original 1 alex TRUE alex 2 alex is a person TRUE alex is a person 3 this is a test FALSE &lt;NA&gt; 4 false FALSE &lt;NA&gt; 5 this is cathy TRUE this is cathy </code></pre> <p>This is the code I am using right now. Any thoughts would be much appreciated!</p> <pre><code># Dummy data and search terms d = data.frame(items = c("alex", "alex is a person", "this is a test", "false", "this is cathy")) searchTerms = c("alex", "bob", "cathy", "derrick", "erica", "ferdinand") # Return true iff search term is found in items column, not between letters d$matches = grepl(paste("(^| |[^abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVQXYZ])", searchTerms, "($| |[^abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVQXYZ])", sep = "", collapse = "|"), d[,1], ignore.case = TRUE ) # Subset data dMatched = d[d$matches==T,] # This is where the problem is: return the value that was originally matched with grepl above dMatched$original = grep(paste("(^| |[^abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVQXYZ])", searchTerms, "($| |[^abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVQXYZ])", sep = "", collapse = "|"), dMatched[,1], ignore.case = TRUE, value = TRUE ) d$original[d$matches==T] = dMatched$original </code></pre>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload