Wednesday, October 5, 2011

Domain input partitioning / dealing with floating points in models – Part I

Originally I wanted to post everything in one article, but it got too long, so I decided to split it and save the hardcore details for the next post.

I find that dealing with models involving floating point input can be a tad tricky, so why not post on this topic? The problem here is infinity. It stems from the fact that any given slice of a floating point argument is of infinite size (or at least close to). Often when working with integer models we limit the input variables to a specified range like: where x in {0..2}. However, this does of course not work if x is a floating point input.

So how do we deal with infinity? First of all we need to clarify what it is we want. Clearly we do not want to test every possible floating point values between 0 and 2. Also, what we want is determined by what we are testing. So let’s make up an example. Assume we are to test the square root function (of real values to keep it simple), what kind of inputs would we give it? If you are a software tester your testing aptitude should kick-in right about now. We want to try a sunshine scenario with a positive value. We also want to make sure any negative input gives an error and also the boundary case 0 gives an error. Then we also have some extreme ranges like a very large positive number, a large negative number and the maximum and minimum values of the input type.

If we analyze our pattern here, all we are doing is partitioning the input domain. If you have been following this blog for a while you may recall I referenced equivalence class partitioning before – this is the same stuff. Simplifying a bit we end up with an input domain like:

The important part here is that we have fixed points of interest (the boundaries) and between we have ranges. For the boundary cases, besides the actual boundary value of interest is the near neighbors ±e, where e is a very small number. For the ranges we do not really care what the exact number is, for want of better we draw a random sample from the region.

Note on Spec Explorer and small floating point values:
Unfortunately I found a bug in Spec Explorer, that it crashes when working with small values in floating point. This limits me from trying out combinations where the parameter is e. The implementation of the automatic domain partitioning will thus also be limited to testing only at boundary cases and not their near neighbors as well.

So, let’s try and reduce this problem to some requirements. Clearly we would like to specify the domain we are working in, this could be integers single/double-precision floating point or even strings! The domain should also have specified a range (our MIN and MAX), and we need the ability to insert boundary values. That pretty much sums it up.

The great thing is that this can be solved using generics! The following interface defines a generic sampler for a domain that fits our specifications:

    public interface IDomainVariableSampler<T>
        T Maximum { get; }
        T Minimum { get; }

        T BoundaryNegative(T boundary);
        T BoundaryPositive(T boundary);
        T Sample(T lowerBound, T upperBound);

In appendix [1] you can find the source code along with implementations of the following domains:

               public class DomainDoubleSampler : IDomainVariableSampler<double>
               public class DomainIntegerSampler : IDomainVariableSampler<int>

We need a mechanism for converting an index into a boundary value or domain partition sample; this is handled by the following class:

    public class DomainInput<T, X> : List<T>
        where X : IDomainVariableSampler<T>, new()
        where T : IComparable<T>
        public static DomainInput<T,X> Empty
        public int NumberOfInputs

        private DomainInput()
        private DomainInput(DomainInput<T,X> clone)

        private void Sort()

        public bool ValidIndex(int index)
        public T Input(int index)

        public DomainInput<T, X> Boundary(T b)

The DomainInput class is the heavy lifter of the solution; it handles any domain type implementing IDomainVariableSampler<T>. Using this implementation you simply call Input(int index) to either return a boundary value or sample from a domain partition.

Applying the DomainInput class
Let’s see how this works in practice. Observe the following model on a double domain:

        public static DomainInput<double, DomainDoubleSampler> InputMap = DomainInput<double, DomainDoubleSampler>.Empty.Boundary(0.0);

        static double value = 0.0;

        [Rule(Action = "Compute(x)/result")]
        static double ComputeRule(int x)
            value = Accumulator.InputMap.Input(x);
            return Math.Round(value, 2);

Setting up a domain with a boundary value at 0.0 and exploring this tiny model shows off the capabilities of the solution and provides the following input pattern to the Compute method:
Here you can read off that it sampled the following values:
-100, -75.13, 0, 24.87, 100

Exactly as we wanted it to! Likewise we can do the same for an integer domain:

        public static DomainInput<int, DomainIntegerSampler> InputMap = DomainInput<int, DomainIntegerSampler>.Empty.Boundary(0);

Now we sample:
               -100, -23, 0, 77, 100

Or we can include an additional boundary value at 10:

        public static DomainInput<int, DomainIntegerSampler> InputMap = DomainInput<int, DomainIntegerSampler>.Empty.Boundary(0).Boundary(10);

               -100, -23, 0, 7, 10, 79, 100

Notice this is just a demo using exploration, normally you would not want to handle the floating point values in the model, here you would base your rules solely on the integer index, and then delegate the floating point conversion to the adapter layer, like this:

    public class Accumulator
        public static DomainInput<double, DomainDoubleSampler> InputMap = DomainInput<double, DomainDoubleSampler>.Empty.Boundary(0.0);

        public static double Compute(int x)
            return Math.Sqrt(InputMap.Input(x));

However, if you need validation to be handled by the model (using return types on rules) you would need to also convert the index to floating point inside the model, just remember to fix the random seed such that the random values sampled are the same between the model and the adapter.

One thing that is particularly nice about this solution is that it’s extendable through the IDomainVariableSampler<T> interface. This means you can extend it to any input domain you like, for example by adjusting the maximum and minimum range of inputs. You can also define your own input domain, one example could be strings following a sequence like: A, B, C, …, Z, AA, AB, … , ZZZZZZZZ – this is a perfectly valid input domain. Also the sampling of partitions is up to you. I used a random sample, but you could also pick the mid-point or any other value inside the partition. All you need to do is implement a sampler for your domain that implements X = IDomainVariableSampler<T>, where T is your primitive input then feed it to DomainInput<T,X>.

No comments:

Post a Comment