StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>Grep uses <a href="http://stat.ethz.ch/R-manual/R-patched/library/base/html/regex.html" rel="nofollow">Regular Expressions</a> to search for substrings matching a pattern. For your problem of matching certain elements from a filename, you would probably want to use <strong>capturing groups</strong> to extract the different parts.</p> <p>An example of a regular expression with a capturing group would be:</p> <pre><code>"Hello, (\w+)" </code></pre> <p>To match strings of the format "Hello, Friend". Here is an explanation of the pattern:</p> <ul> <li><code>\w</code> will match a "word character", while </li> <li><code>+</code> means that at least one, but multiple of them will be matched. </li> <li>For the other structural parts of your file name convention, we can just include <code>_</code> as they are but have to escape <code>.</code> as they have a special meaning in regular expressions. </li> <li>To define a group that you want to match (a capturing group), you put the part to be matched in parentheses <code>(\w+)</code> </li> </ul> <p>Using all that, we get the following pattern:</p> <pre><code>"(\w+)_(\w+)\.doc\.(\w+)\.(\w+)_(\w+)_(\w+)" </code></pre> <p>To get the pattern to work in R, we will have to escape all <code>\</code> characters as <code>\\</code>:</p> <pre><code>> pattern = "(\\w+)_(\\w+)\\.doc\.(\\w+)\\.(\\w+)_(\\w+)_(\\w+)" </code></pre> <p>While grep and regex are powerful, I personally prefer the <a href="http://cran.r-project.org/web/packages/stringr/index.html" rel="nofollow">stringr</a> package for its simpler interface, in particular the <code>str_match</code> function can be very helpful as it will return a matrix with column 1 giving the full match and all subsequent columns giving the matches to the capturing groups:</p> <pre><code>> x = "X_Y.doc.Z.x_y_z" > str_match(x, pattern) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [1,] "X_Y.doc.Z.x_y_z" "X" "Y" "Z" "x" "y" "z" </code></pre> <p>If you are new to regular expressions, you should be fine with a tutorial for any language such as <a href="http://www.regular-expressions.info/tutorial.html" rel="nofollow">this one</a>. Syntax will mostly be similar, but vary only in details while not all features are supported by all programming languages. If you want to try out your expressions before putting them into your programs, I highly recommend <a href="http://regexpal.com/" rel="nofollow">RegexPal</a></p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload