So why is code coverage so bad? There seems to be a strong drive from management to reach high coverage numbers (which is great, it reduces the risk of having test holes). But then it’s often related to the quality of the product – almost as the only metric of quality. The critical problem with this approach is that code coverage tells you nothing about whether the code is correct or not – only that it was exercised.
Let me coin a new term here: Collateral code coverage.
Definition: Additional code coverage from executing a test compared to the minimum possible code coverage required to validate the behavior being tested. In other words, the amount of code coverage where the result of exercising that code is never verified by the test.
Let me illustrate by an example. Consider a small program that converts an input value in kilometers to miles. The program consists of a Windows form application that calls a class library component that converts the value and displays it on the form. Say we want to test that this sample application converts the input value 1 km correctly to 0.62 miles we may develop a unit test that calls the class library directly with the input value and output values:
public void TestConversionOfKilometersToMiles()
But it’s equally common to use a UI based test framework which enters the value and presses the convert button and then reads the computed value from the output field (equally common, but extremely more complex…), which could look like:
public void TestConversionOfKilometersToMiles()
The tests have the same purpose and the same level of validation, but the UI based test will cover considerably more code than the unit test. Since the purpose of both tests is simply to validate that the converted value is correct, the additional coverage from using the UI based approach is considered collateral.
Collateral code coverage is common. It can come from unintentional code coverage (test defects causing it to execute unpredicted code branches) or coverage that is intentionally not verified. Very often tests cases are designed to execute a scenario and then validate a single outcome. The longer the scenario the more code will be covered, but the level of validation remains the same. In some extreme cases we see test cases that cover thousands of code lines but still only contain one validation. These tests obviously have high collateral code coverage. You may argue they are poorly designed tests (and I would agree), but the problem exists for almost any size scenario (maybe excluding unit tests, depending on your definition of a unit).
In general the higher the abstraction level of the test the more collateral code coverage it generates.
Collateral code coverage is unavoidable and has the unfortunate side-effect of artificially inflating confidence in the product, which is why code coverage is a poor metric for product quality.
Ramifications on Model-Based Testing
But this is a blog about Model-Based Testing, so what are the ramifications of collateral code coverage on MBT?
In general MBT seems to be at an even higher abstraction level than scenario based tests, thus it should be even worse with respect to collateral code coverage. In many cases I find that this is true, MBTs from my experience usually have lots of collateral code coverage, which is a deficit. But MBT is a very broad concept, and in cases where MBT is used for generating input/output values for an API, this is not necessarily the case (see for example my posts on the Traveling Salesman Problem).
To enumerate a few model types and their ramification:
UI/Data Flow Model
High-abstracting model that mimics the behavior of data moving in a system under test. These models tend to have high collateral code coverage because they mimic user scenarios
Low-abstraction model that feeds different combinations of input into a single unit under test. The additional collateral code coverage per generated test is normally low. However, they can have a constant overhead from setting up the execution environment, depending on how tightly they interface with the unit under test.
Low-abstraction model that produces random sequences of input to a system for stress testing and security testing. Usually the goal is to crash the system under test, and no functional validation is performed. The objective of these models is to branch out and hit as much code as possible. These models generate fully collateral code coverage.
By all means this is not nail in the coffin for MBT, but it emphasizes the need for clever models along with a careful disclaimer when you rely on code coverage data from MBTs. For example using code coverage obtained from fuzzy modeling would be a horrible metric for product quality (unless you just care about not crashing).