Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>As you've noticed, an expression such as the following is greedy:</p> <pre><code>http:.*\.jpg </code></pre> <p>That means it reads as much input as possible while satisfying the expression.</p> <p>It's the "<code>*</code>" operator that makes it greedy. There's a well-defined regex technique to making this non-greedy… use the "?" modifier after the "<code>*</code>".</p> <pre><code>http:.*?\.jpg </code></pre> <p>Now it will match as little as possible while still satisifying the expression (i.e. it will stop searching at the first occurrence of ".jpg".</p> <p>Of course, if you have a .jpg in the middle of a URL, like:</p> <pre><code>http://mydomain.com/some.jpg-folder/foo.jpg </code></pre> <p>It will not match the full URL.</p> <p>You'll want to define the end of the URL as something that can't be considered part of the URL, such as a space, or a new line, or (if the URL in nested inside parentheses), a closing parenthesis. This can't be solved with just one little regex however if it's included in written language, since URLs are often ambiguous.</p> <p>Take for example:</p> <pre><code>At this page, http://mysite.com/puppy.html, there's a cute little puppy dog. </code></pre> <p>The comma could technically be a part of a URL. You have to deal with a lot of ambiguities like this when looking for URLs in written text, and it's hard not to have bugs due to the ambiguities.</p> <p>EDIT | Here's an example of a regex in PHP that is a quick and dirty solution, being greedy only where needed and <em>trying</em> to deal with the English language:</p> <pre><code>&lt;?php $str = "Checkout http://www.foo.com/test?items=bat,ball, for info about bats and balls"; preg_match('/https?:\/\/([a-zA-Z0-9][a-zA-Z0-9-]*)(\.[a-zA-Z0-9-]+)*((\/[^\s]*)(?=[\s\.,;!\?]))\b/i', $str, $matches); var_dump($matches); </code></pre> <p>It outputs:</p> <pre><code>array(5) { [0]=&gt; string(38) "http://www.foo.com/test?items=bat,ball" [1]=&gt; string(3) "www" [2]=&gt; string(4) ".com" [3]=&gt; string(20) "/test?items=bat,ball" [4]=&gt; string(20) "/test?items=bat,ball" } </code></pre> <p>The explanation is in the comments.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload