Regex Operators in Perl

Too put it simply, a regex unlocks the power to complete string comparisons.  That is, it gives us full control over how we view and and manipulate any string (variable) we have.  Regexes is short for Regular Expressions and it takes even advanced programmers a while to understand these well enough to code them efficiently and accurately.

Regular expressions is one of the reasons Perl is such a powerful language, mastering these will give you full control over the data you’re using through your scripts.  Before we begin, here is a simple regex for us to look at.

$variable =~ s/text/TEXT/gi;
The m// operator

The m// operator is how we deal with matching.  This is used against the default variable $_ by default but implementing another variable is just as easy as inserting the variable name.  This matching operator works well when you need to know if a string contains a certain character, group of characters or word or a group of words.  Instead of saying if ($line eq “test”) which will not work if all we want to know if is the word test exists in $line, we would use m// instead.

The main difference between a simple eq or == and a m//, is one tests for equality and the other tests for the existance of the value inside the string.

my $line;
while($line = <STDIN>)
{
if($line =~ m/exit/) { exit; }
}

This example acts as an infinite loop until it matches what we’re looking for.  It’s asking for input, unless the line contains the word exit it’s not going to end for us.  From this you can see where our search gets used; the characters or words we want to match are placed inside the //.

my $text = “a blue cow ate the cheese”;
if ($text =~ m/cow/)
{
print “mooooooo”;
}

We are taking a predefined variable $text and seeing if we can match the word cow anywhere in it.  As we can see,  while running this code we’ll get mooooo back because it can find the word.

Remember, this matching operator doesn’t test for equality, it checks for the existance.


The s/// operator

The second most used operator is the substitution operator.  This gives us the power and tools to manipulate our information in any way we wish.  We could scan an entire text file and change all the words “red” to “blue” if that’s what we wanted.

This works hand-in-hand with the m/// we just learned in the fact that our words either exist and we can do something with them, or they don’t.  This is to say, we can’t substitute any part of our text unless the text we want to change already exists.

my $line;
while($line = <STDIN>)
{
chomp($line);
$line =~ s/exit/go/;
print “Did you say $line?\n”;
}

We’re doing a bit more work in this example because there is a lot more to a substitution than to match words or phrases.  This is nearly the same example we used before, if you type any phrase containing the word exit something will happen.  In this case, we are s/exit/go which means if it finds the word exit, it will be replaced with the word go.

The best way to learn is to do, so run this script a few times and run a few tests.  Type in words that don’t contain exit and some that do so you get familiar with what’s going on.

Unlike the match operator where we have m/word/, we have a new set s/word/neword/.  The second set of slashes is the replacement words/characters for what you asked for in the first set.

s/this/that;      # change the word from this to that
s/apple/pear;  # change the word apple to pear
s/I have a red car/I have a red bike/; # change the entire sentence if it matches

A few things to note before we move on is our s/// will only work once by default and is case-sensitive.  Put simply, if we tried to change the word this to that, by default it will only change the first occurance of this and leave the rest untouched and it will not match THIS.

my $text = “the rabbit jumped down the hole where the cow lived.”;
$text =~ s/the/THE/;
print $text;

This example substitutes the lowercase word the to the uppercase THE.  By running this script you’ll notice that only the first the that’s found gets replaced giving us the result: THE rabbit jumped down the hole where the cow lived.

my $text = “the rabbit jumped down the hole where the cow lived.”;
$text =~ s/the/THE/;
print $text;

Using /g at the end of our substitution means to substitute globally, instead of just matching the first instance of the word or phrase we’ll substitute it for each time it appears in our data.  Taking the same sentence we used before, simply by adding the /g modifier to the end will replace every occurance of the word the and end with the result: THE rabbit jumped down THE hole where THE cow lived.

my $text = “The rabbit jumped down the hole where the cow lived.”;
$text =~ s/the/THE/gi;
print $text;

With making the small change to our sentence (we capitalized the T on The on the first word), our substitution would normall skip this and replace only the because it’s match is case sensitive.  The /i modifier changes the default to a case-insensitive subtstitution. This will s/// (short for substitute) the words The, THe, tHe and so forth with THE and since we’re still using the global modifier /g, it will change all instances of these words.

Sometimes we want to just remove certain words or phrases instead of just s/// them with another word or phrase.  This can be done by leaving the second set of slashes empty.  Doing so tells Perl that you want to substitute the first set of words for nothing (an empty substution), therefor removing the words completely.

my $text = “The rabbit jumped down the hole where the cow lived.”;
$text =~ s/the/gi;
print $text;

In this last example, we’re removing the word the in any case and as many times as it can be found in the string.  This will produce the results:

rabbit jumped down hole where cow lived


The tr/// operator

The translation operator also works on $_ by default, with this we can make a character-by-character translation.  The s/// worked on words, numbers and phrases.  This operator works on characters soley.

my $line;
while($line = <STDIN>)
{
chomp($line);
$line =~ tr/1/0/;
print “Did you say $line?\n”;
}

We are tranlsating each occurance of the character “1” with “0”. Similar with s///, the 2nd set of slashes is what we’re converting our data into if it matches.  For another simple example,

my $text = “bear”;
$text =~ tr/b/t/;

Which gives us the result tear as we are replacing the character “b” with “t”.

We can remove characters we want from our string instead of swapping it for another.  We do this using the /d (delete) modifier.  We create the character group we want to translate, leave the second set of slashes empty and append d.

my $text = “This is a line of text”;
$text =~ tr/a//d;
print “results: $text”;

Take not the second set of slashes // are to be left empty if you want to delete the characters instead of swapping them with another.  In our example above, we removed all the “a”s from our text, which was just one however.  A better example would have been to remove an “i” or an “e”, but I’ll leave that up to you to test.

We now have a fairly good understanding of swapping one character with another, Perl allows us to swap more than one at a time.  This is to say, we can tr/// as few (if greater than one, of course) or as many characters at a time as we want.

my $text = “This is the line that never ends. Yes it goes on and on my friend. Some people started writing it, not knowing what it was. And they’ll continue writing it forever just because…this is the line that never ends!”;

$text =~ tr/th/ht/d;
print “results: $text”;

You will notice we are translating two different characters, the T and the H.  We are swapping them with H and T.  You can swap as many or as little as you want like we discussed earlier, but keep in mind it’s in a set order.  The first character in the first set will swap with the first character in the second set (our “t” was swapped with “h”), the second character in the first set will always swap with the second letter in the second set (our “h” swapped with “t”).

This example let us switch the H’s and T’s around making funny text 🙂  These are case sensitive too, tr/A// will not be the same as tr/a// and as of the time of writing this, I don’t know of a case-insensitive modifier to remedy this.  So you’ll need to use tr/Aa// if you want to catch all of the same character.

Four our last example, let’s have a little fun and remove all the vowels from our text!  We would do that by adding each of the vowels to the first set of // and appending the delete modifer.

my $text = “This is the line that never ends. Yes it goes on and on my friend. Some people started writing it, not knowing what it was. And they’ll continue writing it forever just because…this is the line that never ends!”;

$text =~ tr/aeiou//d;
print “results: $text”;

We get the results (LOL):


Author: Syperder Co
I waltzed into the Web Design community as a professional when I was just 15 years of age founding SpyderWebDesigns. Through the years my interests shifted from web development to backbone and user interaction.

In 2000 Sulfericacid.com was born. The world’s largest free and 100% ad-free web site where you could use and download 24 Perl and CGI script along with tutorials without limits or restrictions. January 2005 the site was renamed to SpyderScripts.com as a subsidy of SpyderCo.

In 2001 I also founded an SEO company SpyderSubmission.com. We’ve helped nearly 2300 web sites achieve higher rankings than they ever could have imagined since our launch four years ago.

On a more personal note, I’ve attained 28 certifications from BrainBench.com and about 40 certifications in total from all resources. One of these is a near Masters in Perl which ranks second highest test score in the state and 17th throughout the country.

I have a Perl Abbot status on PerlMonks.org working on getting my Perl Saint status this fall.