Note that there are some explanatory texts on larger screens.

plurals
  1. PORegex help: Identifying websites in text
    text
    copied!<p>I am trying to write a function which removes websites from a piece of text. I have:</p> <pre><code>removeWebsites&lt;- function(text){ text = gsub("(http://|https://|www.)[[:alnum:]~!#$%&amp;+-=?,:/;._]*",'',text) return(text) } </code></pre> <p>This handles a large set of the problem, but not a popular one, i.e something of the form <code>xyz.com</code></p> <p>I do not wish to add <code>.com</code> at the end of the above regex, as it limits the scope of that regex. However I tried writing some more regexex like:</p> <pre><code>gsub("[[:alnum:]~!#$%&amp;+-=?,:/;._]*.com",'',testset[10]) </code></pre> <p>This worked, but it also modified email ids of the form <code>abc@xyz.com</code> to <code>abc@</code>. I don't want this, so I modified it to </p> <pre><code>gsub("*((^@)[[:alnum:]~!#$%&amp;+-=?,:/;._]*).com",'\\1',testset[10]) </code></pre> <p>This left the email ids alone but stopped recognising websites of the form <code>xyz.com</code></p> <p>I understand that I need some sort of a set difference here, of the form of what was explained <a href="https://stackoverflow.com/questions/9944361/aflex-regular-expression-difference">here</a> but I was not able to implement it (mainly because I was not able to completely understand it). Any idea on how I go about solving my problem?</p> <p>Edit: I tried negative lookaheads:</p> <pre><code>gsub("[[:alnum:]~!#$%&amp;+-=?,:/;._](?!@)[^(?!.*@)]*.com",'',testset[10]) </code></pre> <p>I got a 'invalid regex' error. I believe a little help in correcting may get this to work...</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload