Making PHP Regex Errors Real

PHP employs Perl Compatible Regular Expressions (PCRE) in the built-in collection of preg_* functions, such as preg_match(). While PCRE is certainly the preferred regular expression library, PHP’s implementation allows the functions to fail without any explicit warning—the user must check preg_last_error() to know that an error occurred. Often, the return of a regular expression match is checked, and different operations are performed if the regex matched or not.

/**
 * Find primes with a regex.
 * http://montreal.pm.org/tech/neil_kandalgaonkar.shtml
 */
function isPrime($num) {
    $num = str_repeat('1', $num);
    $ret = preg_match('/^1?$|^(11+?)\1+$/', $num);
 
    echo 'Return value is ';
    var_dump($ret);
 
    if ($ret === 0) {
        echo "Prime\n";
    } else {
        echo "Not prime\n";
    }
}

Looks perfectly sensible. Through some mathematical regex trickery, we determine whether or not a number is prime. For reasons beyond the scope of this article, this regex fails under default PHP configurations beginning at the number 22201 because PHP’s regular expression backtracking limit is exceeded. While the documentation for preg_match() claims it will return boolean false if a PREG_BACKTRACK_LIMIT_ERROR occurs, the function actually returns integer 0. In the case of the above function, PHP will start calling everything above 22200 a prime number. Even if the documentation were correct we wouldn’t be much better off—every number would be classified as composite number.

How do we deal with this? You must check preg_last_error() every time a PCRE function is used. That warning is bold for a reason: the results of failing to check preg_last_error() can be even more destructive than improperly classifying integers. The function preg_replace() returns null when an error occurs, which PHP will happily coerce to 0 or the empty string depending on context. It is very easy to assume that your regular expression replacement went through successfully and keep trucking along, but your users will not be happy with that null value when it’s used in a string context.

The solution to these ails is the newly released gosRegex module of the Genius Open Source library. This new module provides simple wrappers for all of the PCRE functions in PHP, checking preg_last_error() for you and turning any errors into exception.

// Use the gosRegex functions exactly like their preg_* counterparts
gosRegex::match('/foo (bar)/', 'foo foo bar foo baz foo', $matches);
print_r($matches);
 
// If you do something that causes an error, the gosRegex functions let you know
try {
    // Example from http://us.php.net/preg_last_error
    gosRegex::match('/(?:\D+|<\d+>)*[!?]/', 'foobar foobar foobar');
} catch (gosException_RegularExpression $e) {
    print "Got a regex error: " . $e->getMessage() . "\n";
}

So grab the Genius Open Source library and start being safe with your regular expressions in PHP.

  • Digg
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Google Bookmarks
  • DZone
  • HackerNews
  • LinkedIn
  • Reddit