Note that there are some explanatory texts on larger screens.

plurals
  1. POIssue with string replacing and regex
    primarykey
    data
    text
    <p>I am writing a function to convert urls to working links. This must convert every form: full url like <code>http://link.com</code>, without the protocol like <code>www.link.com</code>, even with just the extension like <code>link.com</code>.</p> <p>So far my function is working but there is a weird bug when there is several times the same link in a string.</p> <pre><code>import re import cgi def process_links(string): """Convert urls to links in a string""" # http + https links = re.findall("(https?://[^\s]+)", string) # www links2 = re.findall("(w{3}\.[^\s]+)", string) # only extension links3 = re.findall("([^\s]+\.[^\s]{2,})", string) links = links + links2 + links3 # remove duplicates links = list(set(links)) string = cgi.escape(string) for link in links: # make sure the href attr starts with http|https if re.match('https?://', link) is None: http_link = 'http://'+link else: http_link = link htmlLink = '&lt;a href="'+http_link+'"&gt;'+link+'&lt;/a&gt;' string = re.sub(link, htmlLink, string) return string </code></pre> <p>Working and failing examples:</p> <pre><code># working string = 'firstlink.com and www.secondlink.com' # output: # '&lt;a href="http://firstlink.com"&gt;firstlink.com&lt;/a&gt; and &lt;a href="http://www.secondlink.com"&gt;www.secondlink.com&lt;/a&gt; # failing: when there are several times the same link string = 'firstlink.com and http://firstlink.com # output: # &lt;a href="&lt;a href="http://firstlink.com"&gt;http://firstlink.com&lt;/a&gt;"&gt;firstlink.com&lt;/a&gt; and http://&lt;a href="&lt;a href="http://firstlink.com"&gt;http://firstlink.com&lt;/a&gt;"&gt;firstlink.com&lt;/a&gt; </code></pre> <p>I've never tried to use regex this "complicated" in python and can't figure out why there is this weird behavior. I think this comes from the <code>re.sub()</code> part where it may replace something that has already been replaced?</p> <p>PS: my function is probably not the best and can certainly be improved, if you have any suggestions I'm listening</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload