To put things into perspective, our regression suite has thousands of highly stable automated tests, and it is run after any minute change to the product has been made. If any single test from the regression suite fails the product change is reverted. Given this setting our four main priorities are reliability, efficiency, maintainability and ease of debugging.
ReliabilityReliability of the automated test cases is of course top priority, we cannot afford to reject developer changes due to test instabilities.
However, often reliability is governed by the underlying framework and not the tests themselves. From my experience model generated tests has the same reliability as any other function test.
EfficiencyExecution speed of the test is the main concern. We are time constrained in how long we can test a developers change, because we cannot slow the productivity of the organization down to a grinding halt because we want to run an unbounded number of tests.
In terms of execution speed model generated tests suffers because they often repeat a lot of steps between tests, effectively re-running the same piece of the test over and over again, where a manually written test case could leverage database backup/restore to avoid redoing steps or simply design smarter cases with less redundancy.
But MBT also suffers from generating too many tests. The model exploration will generate all possible combinations, with no means of determining a priority on individual tests. Effectively we cannot make a meaningful choice between model generated tests, so we are either forced to take all, none or a random selection. Because we want to minimize the risk of bugs slipping into the product, making arbitrary decisions is unadvisable (I have some ideas how we can do better at this, but that is for another blog post).
MaintainabilityAny regression suite will sooner or later need to change, because requirements are changing for the product. Thus some amount of work will be put into refactoring existing regressions tests once in a while.
Model generated tests are actually easier to maintain (given you have staff with the required competencies) than regular functional tests. This boils down to what I blogged about earlier – the essence being that we can easier accommodate design changes because changes to the model automatically propagate into the tests. The additional abstraction actually helps us in this case. Conceptually it is also easier to understand what needs to be done to accommodate a change in requirements when you have a picture of the model state space to look at.
Ease of debuggingCongratulations your test is failing! But is this a bug or a defective test case? This must be the ultimate question we need to answer – and we want an answer fast!
A good starting point is to understand the conditions that apply before the failing step is taken. Reading the test code at first is not very helpful in trying to establish this, because it is auto generated gibberish with no comments to help you understand it. So at first we may conclude that this is a problem.
However, from my experience, even if it is harder to debug failing model based tests, it is not impossible, instead it changes the game. It is now a matter of understanding how the generated test case relates to the state space of the model. The state at which the test were before the offending step was taken is easily read, and for nice models you can trace the steps easily and in that way build up the understanding of the conditions.
Once you get good at it you start to read the model picture and realize important facts. For example, if you have one failing test making a state transition that another test is taking without failing, then something is amass. You need to start investigate why your model does not mimic the system under test, and this could be either a limitation of your model you need to handle or an actual bug in the product.
ConclusionModel generated tests for regressions suites make sense on some points and are even better on the maintainability aspect, but unfortunately it falls flat on its face when we start considering efficiency. The lack of optimization in execution combined with a complete lack of prioritization of tests makes for a poor test pool when trying to establish what is relevant to run for regression.