Code Coverage is a False Security

As mentioned in a previous post, at Genius.com one way we judge the effectiveness of our unit testing is by monitoring the percentage of code covered by our tests. While this is not an unusual way to measure the comprehensiveness of unit testing, the goal of unit testing should not be achieving 100% code coverage. The goal, instead, should be achieving 100% functionality coverage. While the difference between the measures may appear subtle, focusing on the correct goal can profoundly affect the way developers approach writing tests and the quality of the resulting tests.

Code Coverage Analysis

Loosely defined, code coverage is the measurement of classes, methods, and lines of production code covered by tests. In the PHP world, PHPUnit has the built-in ability (when used in conjunction with Xdebug) to generate extremely handy reports detailing code coverage. It is possible to view how many times an individual line of code was hit during test execution as well as aggregate data on the number of classes, methods, and lines tested at a file or directory level.

Functionality Coverage Analysis

Functionality coverage (not to be confused with functional testing), on the other-hand, measures the behavior covered by tests. Unlike code coverage, measuring functionality coverage is an extraordinarily difficult endeavor. Functionality is often a subjective term and is difficult to codify in a way that can be parsed and measured by automated tools and reported in a format consumable by people.

False Security of Code Coverage

For illustrative purposes, let us imagine a user class designed to perform CRUD operations:

class user {
    protected $fields;
 
    protected function __construct($id=null) {
        if ($id != null){
             $this->fields = db::getRow("SELECT * FROM user WHERE id=$id");
        }
        $this->fields = array('id' => null, 'login' => null, 'password' => null);
    }
 
    public static function getByID($id) {
        return new user($id);
    }
 
    public static function getNew() {
        return new user();
    }
 
    public function update() {
        db::execute("UPDATE user SET login='{$this->fields['login']}', password='{$this->fields['password']}' WHERE id={$this->fields[$id]}");
    }
 
    function delete() {
        db::execute("DELETE FROM user WHERE id={$this->fields[$id]}");
    }
 
    public function getColumn($columnName) {
        return $this->fields[$columnName];
    }
 
    public function setColumn($columnName, $value) {
        if ($columnName == 'password') {
            $newValue = hash($value);
        }
        $this->fields[$columnName] = $value;
    }
}

When writing tests for this class with a purely code coverage driven mindset, one might write a single test for the setColumn() method like:

class userTest {
    function testSetColumn() {
        $newPassword = 'I love IE6'
        $obj = user::createNew();
        $obj->setColumn('password', $newPassword);
    }
}

This would result in 100% code coverage for not only the setColumn() method but also the getColumn() method. However, the biggest problem is that the test does not uncover a rather glaring bug because it doesn’t actually assert anything!

Real Security With Functionality Coverage

If the developer is functionality coverage driven when writing tests, she will approach testing differently. First, a functionality driven developer won’t omit an assertion. After all, how can you assert functionality is working if you aren’t testing the result of the operation? identify the execution of the if statement in the setColumn() method as necessarily doubling the amount of functionality encapsulated by the method. The reason it doubles the amount of functionality is that it means that the code following the if statement can be executed with two different pre-conditions.

  1. The if test evaluates to TRUE and the code inside the if statement executes → works properly
  2. The if test evaluates to FALSE and the code inside the if statement is not executed → php error

If another if statement gets added below the password if statement, the amount of functionality would double again.

    public function setColumn($columnName, $value) {
        if ($columnName == 'password') {
            $newValue = hash($value);
        }
        if (!is_escaped($value)) {
            $newValue = escape($newValue);
        }
        $this->fields[$columnName] = $newValue;
    }

Now there are four paths through setColumn()

  1. $columnName is password + $value is NOT escaped
  2. $columnName is password + $value is escaped
  3. $columnName is NOT password + $value is NOT escaped
  4. $columnName is NOT password + $value is escaped

In the former case, a functionality coverage driven developer will write two tests to cover the setColumns() method. After writing the second test (for a non-password field), she will immediately notice the bug upon running PHPUnit. The same logic can be applied to the latter case to derive at least four tests.

Relationship Between Code Coverage and Functionality Coverage

The causal relationship between comprehensive functional coverage and high code coverage percentage is a one-way relationship. Comprehensively testing the functionality of your code guarantees high code coverage but high code coverage does not guarantee all functionality is tested. 100% coverage should be a by-product of 100% functionality coverage and thus, while you might measure code coverage, it is very important to look beyond the percentages to analyze the quality of the tests.

  • Digg
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Google Bookmarks
  • DZone
  • HackerNews
  • LinkedIn
  • Reddit
  • http://www.robertpeaslee.com Robert Peaslee

    Measuring test completeness by code coverage isn’t necessarily bad security, just bad testing methodology – a point I think you made well. Great post :)

  • alb

    Why create new terms? What you’re describing is branch coverage.

    • http://www.genius.com Ryan Ausanka-Crues

      I think it’s an issue of semantics. While it is true that branch coverage is roughly equivalent, I feel that the semantic difference is that branch coverage (as well as decision coverage and path coverage) is focused on testing the way code was written as opposed to focusing on how the method behaves from a caller’s perspective. I should have covered this in the post, thank you for pointing it out.

  • http://www.atlassian.com/software/clover Jesse Gibbs

    Code coverage results can definitely be misinterpreted. The absence of coverage indicates a risk, but the presence of coverage doesn’t necessarily mean you are safe from bugs.

    One way to keep code coverage useful is to look at the most complex code that has the least coverage, and to find classes that have decreased coverage over time. We (Atlassian) are encouraging Java developers to do this with our Clover tool, that highlights the ‘riskiest’ code (based on complexity vs. coverage) as well as ‘movers’ that have reduced coverage in recent builds.