Well – what are some of the fundamental design differences when designing services in the cloud? Scale – it is all about scale, and with scale comes fragmentation and connectivity issues. Instead of having a giant monolithic application maintaining state, the state must instead be tracked in requests or some means of distributed state must exist. The points become interesting to attack from a testing perspective. You start asking yourself, what would happen if a service is suddenly unavailable, in a buggy implementation I could lose part of my state, can the services recover from that error condition? Are there services that are more vulnerable than others? What if the service dies just after I sent it a message, will my time-out recovery mechanism handle this correctly?
To make things more understandable I’ll give an example of a cloud based application. Imagine we are testing an online web shop composed of a website and three supporting services one for authentication, payments and shopping carts talking together to provide a fully functioning application. The underlying implementation could be fragmented like this.
Immediately we start asking questions like: What if the shopping cart service goes down, will my user loose the selection? What if the payment service does not respond to a payment request, is the error propagated to the website or will it be swallowed by the shopping cart service?
All interesting questions, but how do we test for these scenarios? Well, first of all we need an automated way of injecting faults into the system. Assuming we can build actions like “kill service X”, “restore service X”, “put service X into a bad state”, “force service X to send a malformed response”, etc. we can write automation for these scenarios.
Then I had this idea. Why not try to model fault-injection? Essentially we include the fault state of each service into the model state space and implement model actions to change the fault states to explicitly take down a service or make it produce malformed requests/replies. Because these rules are available to the model, the test case generation will drive all possible fault-injection scenarios, an added benefit is that it will even test that test scenarios can be carried out correctly after a service is restored!
1-service modelLet’s see what happens when we try this for a single service (the shopping cart). I’ve created a very simple model where we can add items to the cart and checkout. On top of this we can kill and restore the cart service:
The generated test cases are:
Notice that this is not exactly what we want. There are no tests that will add items to the cart, kill the service, restore and proceed to check-out. The problem is that Spec Explorer believes a restored service is in the exact same state as before it was killed. Thus there is no need to test proceeding to check-out after a restore, because test case S0 already tested that this is working (when the service has not been killed). Spec Explorer sees a restored service as being equivalent to one that was never killed. From a testers perspective however, we would very much like to verify these scenarios. To force Spec Explorer to drive these scenarios we can add a new fault code “Restored” such that the states differ. The model changes to:
And the generated tests are now:
Notice how test case S2 drives the desired behavior.
ConclusionWe explored how Model-Based Testing can be applied to cloud based services.
It is critical to include fault codes that we believe could cause a difference in the SUT – otherwise the model exploration could miss crucial test scenarios. In this case we believe a restored service could behave differently from one that has not gone down.
In the next part of the series we explore what happens when we model more than one service…