Sunday, November 13, 2011

Model-Based Testing of Legacy Code – Part II – Risk Profiling

Last time I left you after describing my latest challenge – making sense of a large piece of legacy code. I eluded to the fact that we started building a risk profile for the changelist that made up the difference between a well-tested root branch and a poorly tested branch.
Okay, so what did we do to tackle this immense problem? Well, our approach was to compute a risk profile based on a model of the probability of code lines containing bugs. Without digging too deep into the mathematics, this is how the profile is built:

A)     All lines of code for files that differs are loaded into the profile
B)     Each line of code is assigned two risk weights: Test coverage, and difference weights
C)     For each file the weights are summed up and normalized and a third weight is introduced: Revision weight

The three weights are combined into a total weight for each code line and aggregated up to file level. The computation is a weighted average of the normalized weights, where the test coverage and difference weights are multiplied together:

Test coverage weight – this weight is used for reducing risk for lines of code that have good test coverage. The weight of a code line is reduced by 10% for every test case that covers this particular line of code. The more tests that hit the line of code the more the weight is reduced. The assumption is that tests are conditionally independent and that a given test case has 10% probability of detecting a bug on this code line. Furthermore, this 10% reduction is improved if the test case touches fewer code lines – the rationale is that the lower level the test is written at, the higher likelihood of validating the correct code line.

Difference weight – this weight is used for weighing the lines of code that were changed from our root branch higher than the ones that were not changed. In our programming language we distinguish between actual code lines and “empty” lines, which are lines that contain meta-data definitions and statements that are not actually executed at runtime (e.g. switch, try, catch, etc.). Empty lines cannot be covered by tests, and as such it makes no sense to include them in the risk profile – however, they can still be different from the root branch. The difference weight is set to 1.0 if the code line was changed and 0.0 if it was not changed. Empty lines that were changed are summed up and distributed evenly on all code lines – the rationale being that we cannot identify which code line carries the risk, so we split it over all code lines not to ignore the fact that a line was actually modified.

Revision weight – this weight kicks-in at the file level (on code lines it is zero), and it is extracted from the source control system and counts the number of changes that were made to the file from the time the branching was made. This effectively measures the “churn” on the files and weighs them by how often they are churned.


Sample risk profile

To illustrate what this profile looks like I’ve generated a sample profile for some of our code. The code has been obscured by removing file name references, but each column on the y-axis represents an individual source file, and the x-axis shows the associated percentage risk associated with a given file:

From this profile it is easy to spot outliers (or high risk files) that can be selectively targeted for test coverage. Also the profile gives you a strategy for reducing the risk at the fastest rate, by starting with the files with the highest risk and working downwards through the list. Furthermore, we can drill down on a single high risk file to look at the file level pattern.

This gives us a starting point for our analysis. If we were to ship the product right now we would incur 100% risk; however the risk profile allows us to give a reasonable estimate of the effort involved in reducing the risk. We can for example write tests for a single high risk file and thus estimate the effort required to reduce the risk by 3-4% (in our case). We can extrapolate this information and give an estimate on the effort required if we want to incur only 50% risk when shipping. Or if we are date-driven we know the amount of resources available, and can estimate the level of risk we will be shipping with.

By re-running the risk profile after additional tests have been produced and comparing the total absolute risk of the new profile with the baseline profile an accurate risk reduction factor can be computed.

However! It’s crucial to understand that a risk profile is relative to a certain baseline. That is, it tells you where to target your efforts to bring down the risk at the fastest rate from where you are now, but it does not tell you anything about the current state of the product – this understanding needs to come from you! A risk profile will always sum up to 100% risk. A product could be in excellent state and ready to ship, and if you run a risk profile on it at that point in time you will ship with 100% risk. Don’t make the mistake of comparing risk profiles for different branches either, as you will be mixing apples and oranges. A risk profile is relative, and if your baselines do not match up, neither will your risk profiles be comparable.

This was a pretty in-depth explanation of how we build risk profiles. I hope it is inspiring for others to read about our approach! Next time we will look into how we can use these risk profiles in our choice of automation strategy, and finally we will get to the Model-Based Testing part of it!

No comments:

Post a Comment