Note that there are some explanatory texts on larger screens.

plurals
  1. POR regex search to capture URLs
    text
    copied!<blockquote> <p><strong>Possible Duplicate:</strong><br> <a href="https://stackoverflow.com/questions/11413294/r-regular-expression-http-matching">R regular expression: http matching</a> </p> </blockquote> <p>I'm working to capture URLs from a chunk of source code using regex.</p> <p>The URL's follow a pattern and are in the following form:</p> <ul> <li>www.google.com/..../1-1,1" </li> <li>www.google.com/..../1-2,2"</li> <li>www.google.com/..../1-20,20"</li> </ul> <p>so far I can get to the url using the following code:</p> <pre><code>pattern = paste("1-", 1:20,",", 1:20, "\"", sep="") </code></pre> <p>this gives me a vector of:</p> <ul> <li>1-1,1 </li> <li>1-2,2</li> <li>.....</li> <li>1-20,20</li> </ul> <p>then I can use these vectors to give me a position or the URLs inside the soure code .</p> <p>Let's say for example that the whole source code is simply: "<a href="http://www.google.com/word/1-1,1" rel="nofollow noreferrer">http://www.google.com/word/1-1,1</a>>"</p> <pre><code>`regexpr("1-1,1", test1k, TRUE)` </code></pre> <p>gives me:</p> <blockquote> <p>[1] 28 attr(,"match.length") [1] 5</p> </blockquote> <p>this means that the pattern 1-1,1 starts at length 28. Given this information, how would I select the whole URL starting at "<a href="http://ww" rel="nofollow noreferrer">http://ww</a>..." until the end "1-1,1>". </p> <p>I guess what I'm asking is, give the position 28, is there a function to select the nearest "http://" string going backwards (this marks the start of the URL). Similarly, given the position 28, is there a way to select the nearest ">" character going forward (this marks the end of the URL).</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload