Current location: Hot Scripts Forums » Programming Languages » PHP » Extract Second-Level Domain Name from URL

Extract Second-Level Domain Name from URL

Reply
  #1  
Old 11-02-09, 12:03 PM
whaffenden whaffenden is offline
Newbie Coder
 
Join Date: Nov 2009
Location: England
Posts: 11
Thanks: 1
Thanked 0 Times in 0 Posts
Extract Second-Level Domain Name from URL

Hi Everyone,

I've been searching the internet for the simplest solution for extracting a second-level domain name from a URL string. I have the following regular expression, which seems to do the job pretty well, obviously the main draw back is the lack of tld support.

PHP Code:
function getDomainName($url)
{
    return 
preg_replace('/^(?:.+?\.)+(.+?\.(?:co\.uk|com|net|org))(\:[0-9]{2,5})?\/*.*$/is''$1'$url);

I firstly wondered if there was a better or faster way of achieving the above and possibly matching more or all tlds? I also would like to modify the above regular expression so that given just an IP address or localhost and/or port that it would just return the IP address or localhost as expected.

I hope this helps someone else out that may have also come across this problem. Thanks in advanced for you're help.

Thanks,
Wayne
Reply With Quote
  #2  
Old 11-02-09, 12:27 PM
Keith's Avatar
Keith Keith is offline
Community VIP
 
Join Date: Feb 2004
Posts: 1,168
Thanks: 0
Thanked 1 Time in 1 Post
__________________
The toxic ZCE
Reply With Quote
  #3  
Old 11-02-09, 03:55 PM
whaffenden whaffenden is offline
Newbie Coder
 
Join Date: Nov 2009
Location: England
Posts: 11
Thanks: 1
Thanked 0 Times in 0 Posts
The parse_url function only gives you the HTTP Host, i would just like to extract the second-level domain name from any given URL. Below are some examples of what i hope to achieve:

Code:
http://www.example.com -> example.com
http://subdomain.subdomain.example.com -> example.com
https://example.com -> example.com
http://www.example.com:90 -> example.com
http://www.example.co.uk/directory/file.php -> example.co.uk
http://localhost -> localhost
http://67.78.34.23 -> 67.78.34.23
Thanks,
Wayne

Last edited by whaffenden; 11-02-09 at 03:58 PM. Reason: Changed example URLs
Reply With Quote
  #4  
Old 11-02-09, 05:11 PM
Keith's Avatar
Keith Keith is offline
Community VIP
 
Join Date: Feb 2004
Posts: 1,168
Thanks: 0
Thanked 1 Time in 1 Post
parse_url() will give you a lot more than that. Did you care to read the documentation?

Aside from that, removing the www. subdomain is easy. The tricky part will be removing subdomains not part of the desired domain. You will indeed need to specify any top-level extensions you expect to be used... or just guess using the extension length, which wouldn't be entirely trustworthy if you plan on using shorter 2nd-level (cnn.com, go.com) or longer 1st-level (example.travel, example.museum).
__________________
The toxic ZCE

Last edited by Keith; 11-02-09 at 05:19 PM.
Reply With Quote
  #5  
Old 11-02-09, 06:45 PM
whaffenden whaffenden is offline
Newbie Coder
 
Join Date: Nov 2009
Location: England
Posts: 11
Thanks: 1
Thanked 0 Times in 0 Posts
Yeah i did have a look and i agree it brings me a stage forwards, as it splits the URL into the appropriate elements. Given that i were to use the function then use the host element, how would i then extract the second-level domain given an array of tlds?
Reply With Quote
  #6  
Old 11-02-09, 08:23 PM
Keith's Avatar
Keith Keith is offline
Community VIP
 
Join Date: Feb 2004
Posts: 1,168
Thanks: 0
Thanked 1 Time in 1 Post
Ok, I've tried to comment this as clearly as possible. The only thing you'll need to add is a bit of research into all TLDs that are decimal separated and add them to the static $doubleTlds variable. I've taken the liberty of including the most obvious one: co.uk

PHP Code:
/**
 * get_2nd_level_name( string $url )
 * Attempts to establish the 2nd level domain of a given URL
 *
 * @return string 2nd-level domain on success, or FALSE on failure
 */

function get_2nd_level_name$url )
{

    
// a list of decimal-separated TLDs
    
static $doubleTlds = array(
        
'co.uk',
    );

    
// sanitize the URL
    
$url trim$url );

    
// if no hostname, use the current by default
    
if ( empty( $url ) || '/' == $url[0] )
    {
        
$url $_SERVER['HTTP_HOST'] . $url;
    }

    
// if no scheme, use `http://` by default
    
if ( FALSE === strpos$url'://' ) )
    {
        
$url 'http://' $url;
    }

    
// can we successfully parse the URL?
    
if ( $host parse_url$urlPHP_URL_HOST ) )
    {

        
// is this an IP?
        
if ( preg_match'/^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$/'$host ) )
        {
            return 
$host;
        }

        
// sanitize the hostname
        
$host strtolower$host );

        
// explode on the decimals
        
$parts explode'.'$host );

        
// is there just one part? (`localhost`, etc)
        
if ( ! isset( $parts[1] ) )
        {
            return 
$parts[0];
        }

        
// grab the TLD
        
$tld array_pop$parts );

        
// grab the hostname
        
$host array_pop$parts ) . '.' $tld;

        
// have we collected a double TLD?
        
if ( ! empty( $parts ) && in_array$host$doubleTlds ) )
        {
            
$host array_pop$parts ) . '.' $host;
        }

        
// send it on it's way
        
return $host;

    }

    
// at this point, nah
    
return FALSE;


__________________
The toxic ZCE

Last edited by Keith; 11-02-09 at 08:26 PM.
Reply With Quote
The Following User Says Thank You to Keith For This Useful Post:
whaffenden (11-04-09)
  #7  
Old 11-03-09, 12:40 PM
whaffenden whaffenden is offline
Newbie Coder
 
Join Date: Nov 2009
Location: England
Posts: 11
Thanks: 1
Thanked 0 Times in 0 Posts
Wow that's some impressive code! This function works perfectly, thank you very much for all you're help, i really appreciate it. I've added as many double tlds to you're function below to help anyone else out that may need to achieve the same thing.

PHP Code:
/**
 * get_2nd_level_name( string $url )
 * Attempts to establish the 2nd level domain of a given URL
 *
 * @return string 2nd-level domain on success, or FALSE on failure
 */

function get_2nd_level_name$url )
{

    
// a list of decimal-separated TLDs
    
static $doubleTlds = array(
        
'co.uk''me.uk''net.uk''org.uk''sch.uk',
        
'ac.uk''gov.uk''nhs.uk''police.uk''mod.uk',
        
'asn.au''com.au''net.au''id.au''org.au',
        
'edu.au''gov.au''csiro.au''br.com''com.cn',
        
'com.tw''cn.com''de.com''eu.com''hu.com',
        
'idv.tw''net.cn''no.com''org.cn''org.tw',
        
'qc.com''ru.com''sa.com''se.com''se.net',
        
'uk.com''uk.net''us.com''uy.com''za.com'
    
);

    
// sanitize the URL
    
$url trim$url );

    
// if no hostname, use the current by default
    
if ( empty( $url ) || '/' == $url[0] )
    {
        
$url $_SERVER['HTTP_HOST'] . $url;
    }

    
// if no scheme, use `http://` by default
    
if ( FALSE === strpos$url'://' ) )
    {
        
$url 'http://' $url;
    }

    
// can we successfully parse the URL?
    
if ( $host parse_url$urlPHP_URL_HOST ) )
    {

        
// is this an IP?
        
if ( preg_match'/^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$/'$host ) )
        {
            return 
$host;
        }

        
// sanitize the hostname
        
$host strtolower$host );

        
// explode on the decimals
        
$parts explode'.'$host );

        
// is there just one part? (`localhost`, etc)
        
if ( ! isset( $parts[1] ) )
        {
            return 
$parts[0];
        }

        
// grab the TLD
        
$tld array_pop$parts );

        
// grab the hostname
        
$host array_pop$parts ) . '.' $tld;

        
// have we collected a double TLD?
        
if ( ! empty( $parts ) && in_array$host$doubleTlds ) )
        {
            
$host array_pop$parts ) . '.' $host;
        }

        
// send it on it's way
        
return $host;

    }

    
// at this point, nah
    
return FALSE;


Thanks again,
Wayne
Reply With Quote
  #8  
Old 11-04-09, 11:45 AM
whaffenden whaffenden is offline
Newbie Coder
 
Join Date: Nov 2009
Location: England
Posts: 11
Thanks: 1
Thanked 0 Times in 0 Posts
Wow that's some very impressive code and it works like a dream. I've added as many double tlds as i could find to help anyone else out that is hoping to achieve the same thing.

PHP Code:
/**
 * get_2nd_level_name( string $url )
 * Attempts to establish the 2nd level domain of a given URL
 *
 * @return string 2nd-level domain on success, or FALSE on failure
 */

function get_2nd_level_name$url )
{

    
// a list of decimal-separated TLDs
    
static $doubleTlds = array(
            
'co.uk''me.uk''net.uk''org.uk''sch.uk',
            
'ac.uk''gov.uk''nhs.uk''police.uk''mod.uk',
            
'asn.au''com.au''net.au''id.au''org.au',
            
'edu.au''gov.au''csiro.au''br.com''com.cn',
            
'com.tw''cn.com''de.com''eu.com''hu.com',
            
'idv.tw''net.cn''no.com''org.cn''org.tw',
            
'qc.com''ru.com''sa.com''se.com''se.net',
            
'uk.com''uk.net''us.com''uy.com''za.com'
        
);

    
// sanitize the URL
    
$url trim$url );

    
// if no hostname, use the current by default
    
if ( empty( $url ) || '/' == $url[0] )
    {
        
$url $_SERVER['HTTP_HOST'] . $url;
    }

    
// if no scheme, use `http://` by default
    
if ( FALSE === strpos$url'://' ) )
    {
        
$url 'http://' $url;
    }

    
// can we successfully parse the URL?
    
if ( $host parse_url$urlPHP_URL_HOST ) )
    {

        
// is this an IP?
        
if ( preg_match'/^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$/'$host ) )
        {
            return 
$host;
        }

        
// sanitize the hostname
        
$host strtolower$host );

        
// explode on the decimals
        
$parts explode'.'$host );

        
// is there just one part? (`localhost`, etc)
        
if ( ! isset( $parts[1] ) )
        {
            return 
$parts[0];
        }

        
// grab the TLD
        
$tld array_pop$parts );

        
// grab the hostname
        
$host array_pop$parts ) . '.' $tld;

        
// have we collected a double TLD?
        
if ( ! empty( $parts ) && in_array$host$doubleTlds ) )
        {
            
$host array_pop$parts ) . '.' $host;
        }

        
// send it on it's way
        
return $host;

    }

    
// at this point, nah
    
return FALSE;


Thanks again Keith i really appreciate all you're help.
Reply With Quote
  #9  
Old 11-04-09, 11:52 AM
whaffenden whaffenden is offline
Newbie Coder
 
Join Date: Nov 2009
Location: England
Posts: 11
Thanks: 1
Thanked 0 Times in 0 Posts
Wow that's some very impressive code, it works like a dream. I've added as many double tlds as i could find below to help anyone else out that wants to achieve the same thing.

PHP Code:
static $doubleTlds = array(
            
'co.uk''me.uk''net.uk''org.uk''sch.uk',
            
'ac.uk''gov.uk''nhs.uk''police.uk''mod.uk',
            
'asn.au''com.au''net.au''id.au''org.au',
            
'edu.au''gov.au''csiro.au''br.com''com.cn',
            
'com.tw''cn.com''de.com''eu.com''hu.com',
            
'idv.tw''net.cn''no.com''org.cn''org.tw',
            
'qc.com''ru.com''sa.com''se.com''se.net',
            
'uk.com''uk.net''us.com''uy.com''za.com'
        
); 
Thanks again Keith for all you're help, i really appreciate it.

Thanks,
Wayne
Reply With Quote
Reply

Bookmarks


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
extract from google news url naissa PHP 1 08-03-09 02:29 PM
[SOLVED] extracting sub domain from url phpdoctor PHP 10 07-25-08 03:43 AM
New Year 2007 Promotions: We are almost 7 years old! mxhub General Advertisements 1 01-05-07 01:14 PM
Pre-christmas special sales - 12 exclusive coupons! Limited stocks! mxhub General Advertisements 0 12-07-06 09:47 PM
domain check ramez Script Requests 8 03-07-04 12:26 AM


All times are GMT -5. The time now is 06:29 AM.
vBulletin® Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.3.2 (Unregistered)