Pattern matching with Regular Expressions

Regular expressions have long been available to UNIX shell programmers, as well as being available in scripting languages like Perl and PHP, but using Javascript 1.2, you can also tap into their power, which makes them especially handy for client-side validation.

Regular Expressions in Javascript

The RegExp object is the parent to the regular expression object. RegExp has a constructor function that instantiates a Regular Expression object much like the Date object instantiates a new date. To a Regular Expression object, you would use the following syntax:

var myregexp  =  new RegExp( “pattern”, [“switch”] );

or, you could use the alternative syntax:

var myregexp = /pattern/[switch]

To use the Regular Expression object to validate user input you must define a pattern string that represents the search criteria. Pattern strings are defined using string literal characters and metacharacters. For example, to determine if a string contained a valid date you could use the following search pattern:

/^\d{1,2}(\-|\/|\.)\d{1,2}\1(\d{2}|\d{4})$/

This would successfully match any of the following:

1.1.1980
01-01-1980
1/11/80

(but note that it doesn’t actually guarantee the date as valid, as you could also enter 32.1.80, so more work must be done!)

Let’s look at the pattern in more detail:

^ indicates the beginning of the string.
\d indicates a digit character and the {1,2} following it means that there must be one or two consecutive digit characters.
(\-|\/|\.) says to match either a hyphen, or a forward-slash, or a full-top. The pipe (|) character between the date separators means ‘or’ and the back-slash (\) before each character means ‘do not treat this character as part of the validation pattern’, in other words, let it pass through the expression.
\d{1,2} is as mentioned previously, one or two digits
\1 This example uses backreferencing to ensure that the second date separator matches the first one.
(\d{2}|\d{4}) means similar to the other digit patterns, but this time, match either two or four consecutive digit characters

and finally,

$ indicates the end of the string.

Categories Of Regular Expression Pattern Characters

Pattern-matching characters can be grouped into several categories. For more details of what pattern characters can be used, see here:

http://www.imps-group.co.uk/ref/js_regexp.pdf

Pattern Switches

In addition to the pattern-matching characters, you can use switches to make the match global or case- insensitive or both. The following is an example of a pattern string definition that uses a switch:

/\s/g

This pattern and switch combination matches all occurrences of a space because it uses the global switch. You can find information of the switches available here:

http://www.imps-group.co.uk/ref/js_regexp.pdf

Now that you’ve been introduced to regular expressions and patterns, let’s look at a few examples of common validation functions.

UK Phone Number

Assuming the area code and phone are not separate fields, a valid phone number would consist of a 4 digit area code, optional space, 3 digits, and another optional space followed by 4 more digits. The first number of the area code must be a zero. A regular expression to do this might look like this:

var myregexp  = /^(0{1})([1-9]{3})\s?\d{3}\s?\d{4}$/;

US Zipcode

A valid zip code should consist of either a five digit code or five digits plus four separated by a hyphen. This can be done with a regular expression like this:

var myregexp = /(^\d{5}$)|(^\d{5}-\d{4}$)/

Integer

A valid integer value should contain only digits plus possibly a leading minus sign for negative numbers. A regular expression to do that would look like this:

var myregexp  = /(^-?\d\d*$)/;

Currency

A floating point value plus a currency symbol (£, $ or € in this case), with optional lead minus sign:

var myregexp  = /^-?[\£\$\€][0-9\.\,]+$/;

The uses of regular expressions are virtually limitless. Take a look at these:

var myregexp  = /^.+@.+\..{2,3}$/;

A simple email match.

var myregexp  = /^(http|https|ftp):\/\/((?:[a-zA-Z0-9_-]+\.?)+):?(\d*)/;

A URL that requires (http, https, ftp)://, A nice domain, and a decent file/folder string.

Testing Regular Expressions using Javascript

To test a pattern in javascript using the regular expression object, we use the ‘test’ method, like this:

test(string);

Here is a sample function, which returns ‘true’ on a successful validation, or ‘false’ if it spots an error.

function validate(str) { var success = true;

var pattern = /^[\sa-zA-Z\,\-]+$/; // a string of letters, plus comma or hyphen

if (!pattern.test(str)) {

success = false;

}

return success;

}

Here are three example calls to our function:

validate(“my name is Stuart, Pleased to meet you”);

This code snippet would return ‘true’, because it doesn’t contain any characters other than alphabetical characters, a comma or the space character.

validate(“I was born in June 1972”);

This code would return ‘false’ – it’s got numbers in it!

validate(“my email is stuart@example.com”);

Again, this would return false, as the email address has characters not allowed in our pattern.

Happy coding!

Author: Stuart Bull
Freelance developer specialising in Open Source application development frameworks and utilising PHP, MySQL, PostgreSQL, XML, XHTML, CSS and Javascript