Using sub-patterns in regular expressions

Using sub-patterns allows you to just filter a part of a specific content. These are patterns in patterns. In the following example I show you how to get the text This text is what I want to filter.

<html>
<body>
This is a sample text to show you how to filter 
some text surrounded by quotes. "Here's a dummy text to 
make our example more complicated". Now here's the text we 
are interested in: "This text is what I want to filter."
</body>
</html>

There are several php commands that handles regular expressions but only preg_match_all is the most interesting for us now.

Let us consider the the sub-pattern first. We’re looking for any characters surrounded by quotes that are different to the quote itself.

"([^"]+)"
() represents a group
[^"] any character that is not the quote
+ tells the regex engine to look for repeated characters one or more times

We’re not interested in the quotes itself so we put them to the left and right of the opening and closing braces. Thus the quotes don’t appear in the final result.

Now we have to specify which match of the sub-pattern we’d like to filter. This is quite easy. Just take some text in front of it, for example we are interested in:

we are interested in: "([^"]+)"

In this example there’s only one space between the colon and the first quote. If the number of spaces is unknown we have to use the meta-character \s followed by an asterisk that represents a space appearing zero or more times. So the final expression looks like

we are interested in:\s*"([^"]+)"

And now some php code to get the final result ;-)

$pattern = '/we are interested in:\s*"([^"]+)"/';
preg_match_all( $pattern, $input, $matches );
$final_result = $matches[1][0];

The pattern has to be surrounded by delimiters which have to be non-alphanumeric. With preg_match_all we’re looking for any matches stored in the array $matches. $matches[0] contains the matches by the full expression and $matches[1] those of the sub-pattern. They can be accessed by a further index variable. In this example $matches[1][0] contains This text is what I want to filter.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>