Showing posts with label modeling. Show all posts
Showing posts with label modeling. Show all posts

Wednesday, November 21, 2012

Simulating Stochastic Models

Recently I’ve been tasked with working on an infrastructure system which comprises several computationally intensive tasks that can be scheduled on a series of host machines. Some tasks have dependencies on other tasks, and cannot be scheduled before its dependencies have finished. In a simplified setup we assume that any task can run on any machine, provided that machine is not busy with running another task.

The following picture illustrates the system


Host A runs task 1 and 3, while host B runs task 2 and 4. Task 4 has a dependency on task 1 and task 3 has one on task 2. When all tasks are completed the job is considered completed. It is evident from the picture that a task (task 3) may be delayed in its scheduling even though its dependencies are satisfied due to resource constraints.

Sunday, June 24, 2012

Solving ‘The Monty Hall Problem’ using Models

Today is about my favorite probability puzzle – The Monty Hall problem. If you haven’t heard of it before you are in for a treat. The problem stated goes as follows (from Wikipedia):
Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats*. You pick a door, say No. 1 [but the door is not opened], and the host, who knows what's behind the doors, opens another door, say No. 3, which has a goat**. He then says to you, "Do you want to pick door No. 2?" Is it to your advantage to switch your choice?
Vos Savant's response was that the contestant should always switch to the other door. […]
Many readers refused to believe that switching is beneficial. After the Monty Hall problem appeared in Parade, approximately 10,000 readers, including nearly 1,000 with PhDs, wrote to the magazine claiming that vos Savant was wrong. (Tierney 1991) Even when given explanations, simulations, and formal mathematical proofs, many people still do not accept that switching is the best strategy.
*The probability of the car being behind any door is uniformly 1/3
**The door that is opened has to have a goat behind it, and it cannot be the one you picked initially, in case the host has multiple choices it is assumed that s/he chooses uniformly at random
The problem is brilliant in its simplicity and the correct answer feels extremely counter-intuitive at first. The natural tendency is to think that it makes no difference whether you switch or not – but the truth is that you should switch! And hopefully once you are done reading this article you will be convinced why.

Wednesday, May 30, 2012

Modeling Multi-Threaded Applications

So last time we looked a bit at stochastic modeling and I introduced the concept in Spec Explorer for modeling random actions/decisions. The example we looked at was a stochastic counter – which is just a tiny bit contrived. Actually, I can’t think of anywhere I would like to use a stochastic counter. Thus, I want to spent some time modeling a real stochastic example in this post.
A very common use of stochasticity is multi-threading/parallel processing. In a parallel system more than one thread can work on solving a problem at the same time (often running on multiple CPUs at the same time). So why does this have to lead to stochastic systems? – you may ask. The fundamental answer to this question is timing. In a parallel environment, to make full use of system resources, you poll a pool of threads for the next available worker thread. Whichever thread is available is determined by “chance”, because it is impacted by the speed of the executing CPUs, latency of external dependencies (RAM, BUS, network, disk, etc.) which we (usually) cannot control.
Consider a fictitious parallel environment with 3 worker threads:
This component is used by assigning the Job Scheduler a Job to be processed, then the scheduler will hold this job in its internal queue and await a worker thread to be available, once a thread is available it will take the incoming Job and start processing it in its own execution context (i.e. CPU, RAM, etc.). After successfully processing the Job the worker thread stores the result in an aggregated result list (which has to be thread-safe).
A very common sunshine scenario to test would be to give the Job Scheduler a single Job and check that a worker thread starts working on it and finally check the result from the aggregated result list.

Thursday, May 10, 2012

Modeling Stochastic Systems in Spec Explorer

One of the more advanced – and powerful – features of Spec Explorer is its ability to handle stochastic SUTs. Stochasticity is common in systems with external dependencies, this could for example be the file system, network interface or external devices. Common for all these dependencies is that they sometimes fail for various reasons. You cannot predict when this will happen, and for the most part in Software testing we strive to avoid these spurious behaviors at all cost.
Another source of stochasticity is the use of random in the SUT (which I will explore a bit more in this post) or stochastic behavior due to race-conditions in multi-threaded applications (which I will investigate further in a later post).
First of all, what does a stochastic model look like? Take a look here:
This picture represents a model of a stochastic accumulator. It shows a slice where the accumulator is increased twice then reset. Each addition is stochastic in that it can add either 1 or 2 to the counter.
When working with stochastic models, in addition to the normal “round” states, Spec Explorer introduces diamond-shaped decision states. Each diamond represents a choice in the model where the SUT can behave in one of many ways. In the case of the stochastic accumulator we can see that the value of the accumulator can go from the initial value of 0 to either 1 or 2 in the first addition step.

Thursday, March 29, 2012

Collateral code coverage, ramifications on Model-Based Testing

My relationship to code coverage is one of love and hate. I love code coverage because it’s a quantitative metric which tells you something about the relation between your tests suites and your product code. I hate code coverage because it must be the most widely abused metric in software testing. For more on this check my previous post on ‘Application of Code Coverage to Model Based Testing

So why is code coverage so bad? There seems to be a strong drive from management to reach high coverage numbers (which is great, it reduces the risk of having test holes). But then it’s often related to the quality of the product – almost as the only metric of quality. The critical problem with this approach is that code coverage tells you nothing about whether the code is correct or not – only that it was exercised.
Let me coin a new term here: Collateral code coverage.
Definition: Additional code coverage from executing a test compared to the minimum possible code coverage required to validate the behavior being tested. In other words, the amount of code coverage where the result of exercising that code is never verified by the test.
Let me illustrate by an example. Consider a small program that converts an input value in kilometers to miles. The program consists of a Windows form application that calls a class library component that converts the value and displays it on the form. Say we want to test that this sample application converts the input value 1 km correctly to 0.62 miles we may develop a unit test that calls the class library directly with the input value and output values:
        [TestMethod]
        public void TestConversionOfKilometersToMiles()
        {
            Assert.AreEqual(0.62, Converter.KilometersToMiles(1));
        }

But it’s equally common to use a UI based test framework which enters the value and presses the convert button and then reads the computed value from the output field (equally common, but extremely more complex…), which could look like:
        [TestMethod]
        public void TestConversionOfKilometersToMiles()
        {
            Form.Kilometer.SetValue(1);
            Form.ConvertButton.Click();
            Assert.AreEqual(0.62, Form.Miles.GetValue<decimal>());
        }

The tests have the same purpose and the same level of validation, but the UI based test will cover considerably more code than the unit test. Since the purpose of both tests is simply to validate that the converted value is correct, the additional coverage from using the UI based approach is considered collateral.

Friday, March 23, 2012

Complex types in immutable containers and ‘magic rules’ – TSM Part III

In part II we saw one approach to optimize the growing algorithm by using a more intelligent concept for extending graphs, than the brute force way of part I.
With the new approach implemented we can now lift our restriction on the input domain size. Effectively there is no need for constraining the grid domain when the algorithm works on the edges instead of grid input combinations.
The original implementation converted vertices and edges into integer representations (using index = y*size + x), but this approach is no longer applicable when the input domain is unbounded. The first step in fixing this is to refactor the model to store vertices and edges as structs instead:
    public struct VertexData
    {
        public int x, y;

        public VertexData(int x, int y)
        {
            this.x = x;
            this.y = y;
        }
    }

    public struct EdgeData
    {
        public VertexData v1, v2;

        public EdgeData(VertexData from, VertexData to)
        {
            this.v1 = from;
            this.v2 = to;
        }
    }

A SetContainer can take a struct, such that our definition of active vertices in the model changes to:
        static SetContainer<VertexData> vertices = new SetContainer<VertexData>();

Because structs are simple data structures instead of object the SetContainer will correctly identify permutations of the same sequence as being equivalent, whereas had we used the Vertex class directly, the SetContainer would be storing object references instead of the object data and sequence would matter.

Sunday, February 26, 2012

Re-Modeling the Traveling Salesman – Part II

Last time we had a look at a very crude model for growing graphs for the Travelling Salesman Problem (TSP). I neglected to actually visualize what the solutions looked like, so here is a rendering of a couple:
Not very complex indeed... So what seems to be the problem here? Well, the model itself is very exhaustive, so it will generate any conceivable graph on the input domain. The single model rule is generic on the full input domain:
config Grid20x20: Main
{
    action abstract static void TravelingSalesman.InsertVertex(int x, int y)
        where x in {0..19}, y in {0..19};
}

This means that for each state Spec Explorer will try X * Y rules, so for larger input domains like 20x20, it takes forever to explore a state fully. What we are interested in with the TSP is not to construct a plethora of simple graphs, but to generate high complexity graphs (because they are much harder to solve). So what we need is a depth-first search (DFS), luckily Spec Explorer allows us to do just that using a config switch:
config Main
{
    ...
    switch DepthFirst = true;
    ...
}

Unfortunately it turns out that the DFS exploration uses an “optimized” version of DFS, causing it to always fully explore each state it visits, instead of immediately exploring from newly discovered states. Not that it’s going to show the extra states that got explored, they are simply hidden away until they get fully explored much later in the process (sorry, guys but that’s just silly). Anyway, that is exactly what we wanted to avoid, because even when using DFS exploration on a 10x10 grid it will take hours to generate graphs with 10 vertices.


Tuesday, February 14, 2012

Modeling the Traveling Salesman

I’m going to attack a well-known problem from the 1800 century in this post: Traveling Salesman Problem (TSP).
The objective in the traveling salesman problem is to find the shortest path through a network graph visiting every node/vertex exactly once and returning to the origin, such a path is called a tour of the graph (or a Hamiltonian cycle if you are into computer science). The problem is NP-hard to solve, meaning it is likely that no feasible exact solution exists for large graphs. There exist a lot of variant on the traveling salesman problem. I will consider the form of the problem in which the graph lies in a plane, also known as the Euclidian TSP. This puts constraints on the distances between nodes and is a symmetric formulation of the problem. But still, this is an NP-hard problem.
To put things into perspective, the number of distinct Hamiltonian cycles through a graph is:
Enumerating all possibilities becomes infeasible for even small graphs. For n = 21 the number of solutions is 1,216,451,004,088,320,000. For the case of n = 101 the number of solutions is approximately 9e+157 which exceeds the approximated number of atoms in the known universe. Of course, I’m not claiming to solve this problem – actually I’m not trying to solve it at all. Many better people have tried that before me. I’ll however focus on testing a TSP solver. It may be exceptionally difficult to solve the problem, but it is possible to construct graphs for which the solution is known (and non-trivial) without too much effort. A model for that could look like:
I plan on doing a series of posts on this problem. In this first post I will define the problem and the method in which I’m constructing a graph for the TSP. In the following posts I will investigate different models for growing TSP graphs, and look into how they perform.

Monday, December 19, 2011

Are Test Oracles a mathematical nicety to cover-up problems?

Last week I had an interesting discussion with a colleague that did not agree with my take on Model-Based Testing. He was unwilling to buy my arguments for why Model-Based Testing is a more efficient way of automating test cases compared to traditional scenario automation. He was a firm believer in Model-Based Testing but his experiences told him that modeling was an expensive cost up-front, which could be made up for in maintenance savings, whereas I claimed that modeling was a much cheaper automation approach.

Furthermore, he argued that my model was close to useless, because it did not come with a validation oracle. At a deeper level, the discussion also revealed some interesting discrepancies to me: He believed.
Needless to say, we never reached agreement on the subject – but I realized later on that we were arguing from completely different perspectives.
After the discussion I started thinking: “How can we be of so differing opinions?” After fumbling around with this question in my head over the weekend, I realized that we were using Model-Based Testing for two completely separate purposes.
A notion started to form in my head – it seemed that there were different ‘schools’ (or views) within Model-Based Testing.
·         Theoretical modelers: They want strict rules around their models, and conduct rigorous validation in an effort to validate the conformity of their models. They have a theoretical approach to software testing and like to mathematically prove algorithms and construct test cases that cover just the necessary and sufficient cases.
·         Pragmatic modelers: They are more of the ad-hoc modeling type. They have a pragmatic approach to Model-Based Testing in which a model is valuable on its own. They understand that the model should be validated, but they can live with limited validations. They see value in the model as a means for communication.

Tuesday, October 18, 2011

Dealing with floating points in models - Part II

Today I’m going to follow up on Part I, but I’ll give you the implementation details of the classes. I provided you with a generic interface:
IDomainVariableSampler<T>

Now as I pointed out previously, we can choose to implement this interface for any domain type, so let’s try implementing it for a double domain:
    public class DomainDoubleSampler : IDomainVariableSampler<double>
    {
        public double Maximum { get { return 100.0; } }
        public double Minimum { get { return -100.0; } }

        public double BoundaryNegative(double boundary)
        {
            return boundary - double.Epsilon;
        }

        public double BoundaryPositive(double boundary)
        {
            return boundary + double.Epsilon;
        }

        public double Sample(double lowerBound, double upperBound)
        {
            Random rand = new Random(1);
            return lowerBound + rand.NextDouble() * (upperBound - lowerBound);
        }
    }

This implementation defines an input range of [-100.0, 100.0] and is sampling at random inside partitions of this interval.

Friday, October 14, 2011

James D. McCaffrey implements Recursive Binary Search Tree

Today I came across this blog article by James D. McCaffrey, where he actually implements a recursive binary search tree, and I couldn’t resist the urge to apply my own Model-Based Testing solution [1,2] to his implementation!

I had to extend his solution a bit to allow for an internal validation function of any constructed tree, but besides from that my only comment to his solution is that he treats error conditions by silently ignore them – that is the following actions generate no errors/exceptions:
A)     Deleting an element that does not exist in the tree.
B)     Inserting an element twice into the tree.

Patching these up in his code, my model generated test cases all passed, so his implementation is rock-solid! Kudos for that!

References:
[1] Application of Model Based Testing to a Binary Search Tree - Part I
[2] Application of Model Based Testing to a Binary Search Tree - Part II

Wednesday, October 5, 2011

Domain input partitioning / dealing with floating points in models – Part I

Originally I wanted to post everything in one article, but it got too long, so I decided to split it and save the hardcore details for the next post.

I find that dealing with models involving floating point input can be a tad tricky, so why not post on this topic? The problem here is infinity. It stems from the fact that any given slice of a floating point argument is of infinite size (or at least close to). Often when working with integer models we limit the input variables to a specified range like: where x in {0..2}. However, this does of course not work if x is a floating point input.

So how do we deal with infinity? First of all we need to clarify what it is we want. Clearly we do not want to test every possible floating point values between 0 and 2. Also, what we want is determined by what we are testing. So let’s make up an example. Assume we are to test the square root function (of real values to keep it simple), what kind of inputs would we give it? If you are a software tester your testing aptitude should kick-in right about now. We want to try a sunshine scenario with a positive value. We also want to make sure any negative input gives an error and also the boundary case 0 gives an error. Then we also have some extreme ranges like a very large positive number, a large negative number and the maximum and minimum values of the input type.

If we analyze our pattern here, all we are doing is partitioning the input domain. If you have been following this blog for a while you may recall I referenced equivalence class partitioning before – this is the same stuff. Simplifying a bit we end up with an input domain like:

The important part here is that we have fixed points of interest (the boundaries) and between we have ranges. For the boundary cases, besides the actual boundary value of interest is the near neighbors ±e, where e is a very small number. For the ranges we do not really care what the exact number is, for want of better we draw a random sample from the region.

Note on Spec Explorer and small floating point values:
Unfortunately I found a bug in Spec Explorer, that it crashes when working with small values in floating point. This limits me from trying out combinations where the parameter is e. The implementation of the automatic domain partitioning will thus also be limited to testing only at boundary cases and not their near neighbors as well.


Tuesday, September 6, 2011

Multi-Threaded Modeling – Barrier Synchronization

Okay, all my previous posts have been based on static system models, where rules and actions are static functions in classes. A lot can be accomplished with this, and it is also possible to wrap instance based applications inside static functions. However, Spec Explorer allows us to work with actual instance based models.

So what does this mean for us? Well, it means we are allowed to instantiate any number of objects of some type and Spec Explorer can reference each of these objects individually. For example, you could write a class that represents a complex number, and have Spec Explorer create two instances of this object and then add them together. This would be a basic example, but instead let’s jump to something more tricky – multi-threading!

You can imagine that instance based modeling and multi-threading are closely related. But there are some key issues one must understand first. Instance based modeling is not multi-threaded, it is exactly as sequential as any static model. Spec Explorer will not create a new thread for you when you instantiate an object. Any interaction by Spec Explorer will be from the “main”-thread, you have to handle communication with your threads.

Barrier synchronization
The example I have picked for this post is a barrier synchronization implementation. Barrier synchronization is a pattern for worker threads in multi-threaded application. With barrier synchronization each thread must wait for its co-workers to complete their cycle of work before the next cycle starts. Denoting the number of cycles completed by thread i as ci, this can be stated by the following invariant:
          
Essentially each thread reaches a “barrier” upon completion of its work. This “barrier” is only lowered once all threads have reached it. Upon release of the “barrier” the worker threads continues their work.

Sunday, July 24, 2011

Model-Based Integration Testing – Part II (State encapsulation)

Last time we saw how we could perform Model-Based Integration Testing of a FileSystem model and a Network model using extension models. One problem with this approach was that the first model had to expose its internal state to the other model. In this post I’m going to talk about model state space encapsulation.

A step in the right direction is to do model encapsulation. Let’s start out by obtaining the same model results but having the FileSystem model encapsulating its state. Simply change accessibility of all internal state variables to private:
        private static int drives, currentDirectory;
        private static SequenceContainer<int> directories = new SequenceContainer<int>();

Now of course our Network model won’t compile, as it was referencing the FileSystem state variables directly. We have to realize what the common concept is here – the Network layer is supposed to mimic a directory structure and to do this it must have an interface to create a drive in the FileSystem models state space. One easy way to obtain this is to increase the accessibility of the CreateDrive rule to public, so the Network model can directly call a rule in the FileSystem model:
        [Rule(Action = "MapNetworkDrive()")]
        static void MapNetworkDrive()
        {
            FileSystem.CreateDrive();
        }

Friday, May 27, 2011

Application of Model Based Testing to a Binary Search Tree - Part II

Okay, today I want to wrap up on the model based testing of the binary search tree implementation I did last time. Remember how we uncovered a problem that the model did not cover all of the code? Drawing from our experience from the Application of Code Coverage to Model Based Testing post we understand that our model does not reflect closely the actual implementation, and we have a risk in terms of a test hole.

Understanding the problem
Before we jump in to adding additional tests, let’s try and understand what the problem really is. Remember I hinted that it has to do with our choice of container in the model. So let’s try to understand this some more by building some trees from the model:

Notice that even though these three trees are very different constructs, the internal set representation of the model reads (0, 1, 2) for all cases.

Friday, May 20, 2011

Application of Model Based Testing to a Binary Search Tree - Part I

I wanted to post some real modeling examples for a change, where I show how to use model based testing to explore test case combinatorics. The obvious choice is of course the more than sufficiently modeled calculator. So I decided not to choose a calculator, but something a bit different. I thought to myself, why not a simple Binary Search Tree? Hmm, but does it have any potential?

BSTs are really nice in that you can write invariants for them: 
For all nodes n in T: value(left(n)) < value(n) < value(right(n))

However, in a normal functional testing paradigm this is not entirely sufficient to validate the tree. The problem is that any given sub-tree of the BST will pass the integrity check – thus if I were to introduce a bug that removed the whole sub-tree when deleting a node from the tree, the resulting tree is still a valid BST but it’s not the expected! Normally we would need to check the node count and also that the expected values are to be found in the tree, however in a model based testing paradigm this is no longer required as we will see later on.