Extract Second-Level Domain Name from URL

11-02-09, 12:03 PM
|
|
Newbie Coder
|
|
Join Date: Nov 2009
Location: England
Posts: 11
Thanks: 1
Thanked 0 Times in 0 Posts
|
|
|
Extract Second-Level Domain Name from URL
Hi Everyone,
I've been searching the internet for the simplest solution for extracting a second-level domain name from a URL string. I have the following regular expression, which seems to do the job pretty well, obviously the main draw back is the lack of tld support.
PHP Code:
function getDomainName($url)
{
return preg_replace('/^(?:.+?\.)+(.+?\.(?:co\.uk|com|net|org))(\:[0-9]{2,5})?\/*.*$/is', '$1', $url);
}
I firstly wondered if there was a better or faster way of achieving the above and possibly matching more or all tlds? I also would like to modify the above regular expression so that given just an IP address or localhost and/or port that it would just return the IP address or localhost as expected.
I hope this helps someone else out that may have also come across this problem. Thanks in advanced for you're help.
Thanks,
Wayne
|

11-02-09, 12:27 PM
|
 |
Community VIP
|
|
Join Date: Feb 2004
Posts: 1,168
Thanks: 0
Thanked 1 Time in 1 Post
|
|
__________________
The toxic ZCE
|

11-02-09, 03:55 PM
|
|
Newbie Coder
|
|
Join Date: Nov 2009
Location: England
Posts: 11
Thanks: 1
Thanked 0 Times in 0 Posts
|
|
The parse_url function only gives you the HTTP Host, i would just like to extract the second-level domain name from any given URL. Below are some examples of what i hope to achieve:
Code:
http://www.example.com -> example.com
http://subdomain.subdomain.example.com -> example.com
https://example.com -> example.com
http://www.example.com:90 -> example.com
http://www.example.co.uk/directory/file.php -> example.co.uk
http://localhost -> localhost
http://67.78.34.23 -> 67.78.34.23
Thanks,
Wayne
Last edited by whaffenden; 11-02-09 at 03:58 PM.
Reason: Changed example URLs
|

11-02-09, 05:11 PM
|
 |
Community VIP
|
|
Join Date: Feb 2004
Posts: 1,168
Thanks: 0
Thanked 1 Time in 1 Post
|
|
parse_url() will give you a lot more than that. Did you care to read the documentation?
Aside from that, removing the www. subdomain is easy. The tricky part will be removing subdomains not part of the desired domain. You will indeed need to specify any top-level extensions you expect to be used... or just guess using the extension length, which wouldn't be entirely trustworthy if you plan on using shorter 2nd-level (cnn.com, go.com) or longer 1st-level (example.travel, example.museum).
__________________
The toxic ZCE
Last edited by Keith; 11-02-09 at 05:19 PM.
|

11-02-09, 06:45 PM
|
|
Newbie Coder
|
|
Join Date: Nov 2009
Location: England
Posts: 11
Thanks: 1
Thanked 0 Times in 0 Posts
|
|
Yeah i did have a look and i agree it brings me a stage forwards, as it splits the URL into the appropriate elements. Given that i were to use the function then use the host element, how would i then extract the second-level domain given an array of tlds?
|

11-02-09, 08:23 PM
|
 |
Community VIP
|
|
Join Date: Feb 2004
Posts: 1,168
Thanks: 0
Thanked 1 Time in 1 Post
|
|
Ok, I've tried to comment this as clearly as possible. The only thing you'll need to add is a bit of research into all TLDs that are decimal separated and add them to the static $doubleTlds variable. I've taken the liberty of including the most obvious one: co.uk
PHP Code:
/** * get_2nd_level_name( string $url ) * Attempts to establish the 2nd level domain of a given URL * * @return string 2nd-level domain on success, or FALSE on failure */
function get_2nd_level_name( $url ) {
// a list of decimal-separated TLDs static $doubleTlds = array( 'co.uk', );
// sanitize the URL $url = trim( $url );
// if no hostname, use the current by default if ( empty( $url ) || '/' == $url[0] ) { $url = $_SERVER['HTTP_HOST'] . $url; }
// if no scheme, use `http://` by default if ( FALSE === strpos( $url, '://' ) ) { $url = 'http://' . $url; }
// can we successfully parse the URL? if ( $host = parse_url( $url, PHP_URL_HOST ) ) {
// is this an IP? if ( preg_match( '/^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$/', $host ) ) { return $host; }
// sanitize the hostname $host = strtolower( $host );
// explode on the decimals $parts = explode( '.', $host );
// is there just one part? (`localhost`, etc) if ( ! isset( $parts[1] ) ) { return $parts[0]; }
// grab the TLD $tld = array_pop( $parts );
// grab the hostname $host = array_pop( $parts ) . '.' . $tld;
// have we collected a double TLD? if ( ! empty( $parts ) && in_array( $host, $doubleTlds ) ) { $host = array_pop( $parts ) . '.' . $host; }
// send it on it's way return $host;
}
// at this point, nah return FALSE;
}
__________________
The toxic ZCE
Last edited by Keith; 11-02-09 at 08:26 PM.
|
|
The Following User Says Thank You to Keith For This Useful Post:
|
|

11-03-09, 12:40 PM
|
|
Newbie Coder
|
|
Join Date: Nov 2009
Location: England
Posts: 11
Thanks: 1
Thanked 0 Times in 0 Posts
|
|
Wow that's some impressive code! This function works perfectly, thank you very much for all you're help, i really appreciate it. I've added as many double tlds to you're function below to help anyone else out that may need to achieve the same thing.
PHP Code:
/**
* get_2nd_level_name( string $url )
* Attempts to establish the 2nd level domain of a given URL
*
* @return string 2nd-level domain on success, or FALSE on failure
*/
function get_2nd_level_name( $url )
{
// a list of decimal-separated TLDs
static $doubleTlds = array(
'co.uk', 'me.uk', 'net.uk', 'org.uk', 'sch.uk',
'ac.uk', 'gov.uk', 'nhs.uk', 'police.uk', 'mod.uk',
'asn.au', 'com.au', 'net.au', 'id.au', 'org.au',
'edu.au', 'gov.au', 'csiro.au', 'br.com', 'com.cn',
'com.tw', 'cn.com', 'de.com', 'eu.com', 'hu.com',
'idv.tw', 'net.cn', 'no.com', 'org.cn', 'org.tw',
'qc.com', 'ru.com', 'sa.com', 'se.com', 'se.net',
'uk.com', 'uk.net', 'us.com', 'uy.com', 'za.com'
);
// sanitize the URL
$url = trim( $url );
// if no hostname, use the current by default
if ( empty( $url ) || '/' == $url[0] )
{
$url = $_SERVER['HTTP_HOST'] . $url;
}
// if no scheme, use `http://` by default
if ( FALSE === strpos( $url, '://' ) )
{
$url = 'http://' . $url;
}
// can we successfully parse the URL?
if ( $host = parse_url( $url, PHP_URL_HOST ) )
{
// is this an IP?
if ( preg_match( '/^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$/', $host ) )
{
return $host;
}
// sanitize the hostname
$host = strtolower( $host );
// explode on the decimals
$parts = explode( '.', $host );
// is there just one part? (`localhost`, etc)
if ( ! isset( $parts[1] ) )
{
return $parts[0];
}
// grab the TLD
$tld = array_pop( $parts );
// grab the hostname
$host = array_pop( $parts ) . '.' . $tld;
// have we collected a double TLD?
if ( ! empty( $parts ) && in_array( $host, $doubleTlds ) )
{
$host = array_pop( $parts ) . '.' . $host;
}
// send it on it's way
return $host;
}
// at this point, nah
return FALSE;
}
Thanks again,
Wayne
|

11-04-09, 11:45 AM
|
|
Newbie Coder
|
|
Join Date: Nov 2009
Location: England
Posts: 11
Thanks: 1
Thanked 0 Times in 0 Posts
|
|
Wow that's some very impressive code and it works like a dream. I've added as many double tlds as i could find to help anyone else out that is hoping to achieve the same thing.
PHP Code:
/**
* get_2nd_level_name( string $url )
* Attempts to establish the 2nd level domain of a given URL
*
* @return string 2nd-level domain on success, or FALSE on failure
*/
function get_2nd_level_name( $url )
{
// a list of decimal-separated TLDs
static $doubleTlds = array(
'co.uk', 'me.uk', 'net.uk', 'org.uk', 'sch.uk',
'ac.uk', 'gov.uk', 'nhs.uk', 'police.uk', 'mod.uk',
'asn.au', 'com.au', 'net.au', 'id.au', 'org.au',
'edu.au', 'gov.au', 'csiro.au', 'br.com', 'com.cn',
'com.tw', 'cn.com', 'de.com', 'eu.com', 'hu.com',
'idv.tw', 'net.cn', 'no.com', 'org.cn', 'org.tw',
'qc.com', 'ru.com', 'sa.com', 'se.com', 'se.net',
'uk.com', 'uk.net', 'us.com', 'uy.com', 'za.com'
);
// sanitize the URL
$url = trim( $url );
// if no hostname, use the current by default
if ( empty( $url ) || '/' == $url[0] )
{
$url = $_SERVER['HTTP_HOST'] . $url;
}
// if no scheme, use `http://` by default
if ( FALSE === strpos( $url, '://' ) )
{
$url = 'http://' . $url;
}
// can we successfully parse the URL?
if ( $host = parse_url( $url, PHP_URL_HOST ) )
{
// is this an IP?
if ( preg_match( '/^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$/', $host ) )
{
return $host;
}
// sanitize the hostname
$host = strtolower( $host );
// explode on the decimals
$parts = explode( '.', $host );
// is there just one part? (`localhost`, etc)
if ( ! isset( $parts[1] ) )
{
return $parts[0];
}
// grab the TLD
$tld = array_pop( $parts );
// grab the hostname
$host = array_pop( $parts ) . '.' . $tld;
// have we collected a double TLD?
if ( ! empty( $parts ) && in_array( $host, $doubleTlds ) )
{
$host = array_pop( $parts ) . '.' . $host;
}
// send it on it's way
return $host;
}
// at this point, nah
return FALSE;
}
Thanks again Keith i really appreciate all you're help.
|

11-04-09, 11:52 AM
|
|
Newbie Coder
|
|
Join Date: Nov 2009
Location: England
Posts: 11
Thanks: 1
Thanked 0 Times in 0 Posts
|
|
Wow that's some very impressive code, it works like a dream. I've added as many double tlds as i could find below to help anyone else out that wants to achieve the same thing.
PHP Code:
static $doubleTlds = array(
'co.uk', 'me.uk', 'net.uk', 'org.uk', 'sch.uk',
'ac.uk', 'gov.uk', 'nhs.uk', 'police.uk', 'mod.uk',
'asn.au', 'com.au', 'net.au', 'id.au', 'org.au',
'edu.au', 'gov.au', 'csiro.au', 'br.com', 'com.cn',
'com.tw', 'cn.com', 'de.com', 'eu.com', 'hu.com',
'idv.tw', 'net.cn', 'no.com', 'org.cn', 'org.tw',
'qc.com', 'ru.com', 'sa.com', 'se.com', 'se.net',
'uk.com', 'uk.net', 'us.com', 'uy.com', 'za.com'
);
Thanks again Keith for all you're help, i really appreciate it.
Thanks,
Wayne
|
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
|
|
|
| Thread Tools |
|
|
| Display Modes |
Hybrid Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|