I've been searching the internet for the simplest solution for extracting a second-level domain name from a URL string. I have the following regular expression, which seems to do the job pretty well, obviously the main draw back is the lack of tld support.
PHP Code:
function getDomainName($url)
{
return preg_replace('/^(?:.+?\.)+(.+?\.(?:co\.uk|com|net|org))(\:[0-9]{2,5})?\/*.*$/is', '$1', $url);
}
I firstly wondered if there was a better or faster way of achieving the above and possibly matching more or all tlds? I also would like to modify the above regular expression so that given just an IP address or localhost and/or port that it would just return the IP address or localhost as expected.
I hope this helps someone else out that may have also come across this problem. Thanks in advanced for you're help.
The parse_url function only gives you the HTTP Host, i would just like to extract the second-level domain name from any given URL. Below are some examples of what i hope to achieve:
parse_url() will give you a lot more than that. Did you care to read the documentation?
Aside from that, removing the www. subdomain is easy. The tricky part will be removing subdomains not part of the desired domain. You will indeed need to specify any top-level extensions you expect to be used... or just guess using the extension length, which wouldn't be entirely trustworthy if you plan on using shorter 2nd-level (cnn.com, go.com) or longer 1st-level (example.travel, example.museum).
Yeah i did have a look and i agree it brings me a stage forwards, as it splits the URL into the appropriate elements. Given that i were to use the function then use the host element, how would i then extract the second-level domain given an array of tlds?
Ok, I've tried to comment this as clearly as possible. The only thing you'll need to add is a bit of research into all TLDs that are decimal separated and add them to the static $doubleTlds variable. I've taken the liberty of including the most obvious one: co.uk
PHP Code:
/** * get_2nd_level_name( string $url ) * Attempts to establish the 2nd level domain of a given URL * * @return string 2nd-level domain on success, or FALSE on failure */
function get_2nd_level_name( $url ) {
// a list of decimal-separated TLDs static $doubleTlds = array( 'co.uk', );
// sanitize the URL $url = trim( $url );
// if no hostname, use the current by default if ( empty( $url ) || '/' == $url[0] ) { $url = $_SERVER['HTTP_HOST'] . $url; }
// if no scheme, use `http://` by default if ( FALSE === strpos( $url, '://' ) ) { $url = 'http://' . $url; }
// can we successfully parse the URL? if ( $host = parse_url( $url, PHP_URL_HOST ) ) {
// is this an IP? if ( preg_match( '/^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$/', $host ) ) { return $host; }
// sanitize the hostname $host = strtolower( $host );
// explode on the decimals $parts = explode( '.', $host );
// is there just one part? (`localhost`, etc) if ( ! isset( $parts[1] ) ) { return $parts[0]; }
Wow that's some impressive code! This function works perfectly, thank you very much for all you're help, i really appreciate it. I've added as many double tlds to you're function below to help anyone else out that may need to achieve the same thing.
PHP Code:
/**
* get_2nd_level_name( string $url )
* Attempts to establish the 2nd level domain of a given URL
*
* @return string 2nd-level domain on success, or FALSE on failure
*/
Wow that's some very impressive code and it works like a dream. I've added as many double tlds as i could find to help anyone else out that is hoping to achieve the same thing.
PHP Code:
/**
* get_2nd_level_name( string $url )
* Attempts to establish the 2nd level domain of a given URL
*
* @return string 2nd-level domain on success, or FALSE on failure
*/
Wow that's some very impressive code, it works like a dream. I've added as many double tlds as i could find below to help anyone else out that wants to achieve the same thing.