Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>Basically, what you want is:</p> <pre><code>google.com -&gt; google.com -&gt; google www.google.com -&gt; google.com -&gt; google google.co.uk -&gt; google.co.uk -&gt; google www.google.co.uk -&gt; google.co.uk -&gt; google www.google.org -&gt; google.org -&gt; google www.google.org.uk -&gt; google.org.uk -&gt; google </code></pre> <p>Optional:</p> <pre><code>www.google.com -&gt; google.com -&gt; www.google images.google.com -&gt; google.com -&gt; images.google mail.yahoo.co.uk -&gt; yahoo.co.uk -&gt; mail.yahoo mail.yahoo.com -&gt; yahoo.com -&gt; mail.yahoo www.mail.yahoo.com -&gt; yahoo.com -&gt; mail.yahoo </code></pre> <p>You don't need to construct an ever-changing regex as 99% of domains will be matched properly if you simply look at the 2nd last part of the name:</p> <pre><code>(co|com|gov|net|org) </code></pre> <p>If it is one of these, then you need to match 3 dots, else 2. Simple. Now, my regex wizardry is no match for that of some other SO'ers, so the best way I've found to achieve this is with some code, assuming you've already stripped off the path:</p> <pre><code> my @d=split /\./,$domain; # split the domain part into an array $c=@d; # count how many parts $dest=$d[$c-2].'.'.$d[$c-1]; # use the last 2 parts if ($d[$c-2]=~m/(co|com|gov|net|org)/) { # is the second-last part one of these? $dest=$d[$c-3].'.'.$dest; # if so, add a third part }; print $dest; # show it </code></pre> <p>To just get the name, as per your question:</p> <pre><code> my @d=split /\./,$domain; # split the domain part into an array $c=@d; # count how many parts if ($d[$c-2]=~m/(co|com|gov|net|org)/) { # is the second-last part one of these? $dest=$d[$c-3]; # if so, give the third last $dest=$d[$c-4].'.'.$dest if ($c&gt;3); # optional bit } else { $dest=$d[$c-2]; # else the second last $dest=$d[$c-3].'.'.$dest if ($c&gt;2); # optional bit }; print $dest; # show it </code></pre> <p>I like this approach because it's maintenance-free. Unless you want to validate that it's actually a legitimate domain, but that's kind of pointless because you're most likely only using this to process log files and an invalid domain wouldn't find its way in there in the first place.</p> <p>If you'd like to match "unofficial" subdomains such as bozo.za.net, or bozo.au.uk, bozo.msf.ru just add (za|au|msf) to the regex.</p> <p>I'd love to see someone do all of this using just a regex, I'm sure it's possible.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload