Sunday, June 24, 2012

Solving ‘The Monty Hall Problem’ using Models

Today is about my favorite probability puzzle – The Monty Hall problem. If you haven’t heard of it before you are in for a treat. The problem stated goes as follows (from Wikipedia):
Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats*. You pick a door, say No. 1 [but the door is not opened], and the host, who knows what's behind the doors, opens another door, say No. 3, which has a goat**. He then says to you, "Do you want to pick door No. 2?" Is it to your advantage to switch your choice?
Vos Savant's response was that the contestant should always switch to the other door. […]
Many readers refused to believe that switching is beneficial. After the Monty Hall problem appeared in Parade, approximately 10,000 readers, including nearly 1,000 with PhDs, wrote to the magazine claiming that vos Savant was wrong. (Tierney 1991) Even when given explanations, simulations, and formal mathematical proofs, many people still do not accept that switching is the best strategy.
*The probability of the car being behind any door is uniformly 1/3
**The door that is opened has to have a goat behind it, and it cannot be the one you picked initially, in case the host has multiple choices it is assumed that s/he chooses uniformly at random
The problem is brilliant in its simplicity and the correct answer feels extremely counter-intuitive at first. The natural tendency is to think that it makes no difference whether you switch or not – but the truth is that you should switch! And hopefully once you are done reading this article you will be convinced why.

Monday, June 18, 2012

Can your tests stand the ‘test of time’?

Today I want to digress a bit from Model-Based Testing and talk about a general issue in software testing, applicable to all types of testing. My focus will be on the use of time-dependent methods in test code. These include library functions like ‘DateTime.Now’, reading the BIOS clock, retrieving the current time from a time server, etc. The use of time dependent methods can lead to unpredictable behavior and is a common source for fragility in tests.
But first, let’s look at a definition of time. From Wikipedia:
Time is the indefinite continued progress of existence and events that occur in apparently irreversible succession from the past through the present to the future.
There are two vital pieces of information in this definition. They are continued progress and irreversible. There is nothing about size and quantity of time in the definition. In fact our daily representation of time in hours, minutes, seconds, milliseconds, etc. is completely arbitrary. Instead I like to think of time in a mathematical sense, as a strictly increasing series t : ti < ti+1. The absolute value of ti is irrelevant.
This is how I believe time should be view in test code, in fact I will postulate a “test of time” at the end of this article.
I will assume that we all agree fragile tests are costly to maintain, and have no place in a regressions suite.
Common misuses of time #1: Validation of time-dependent data
Common misuses of time is validation of messages, dialogs, errors and other strings containing the current time generated by the system under test. I recently saw a test case validating a text containing the latest updated time of a graph. The test would look something like:
            graph.Update();
            String expectedMessage = String.Format("Last update: {0}", DateTime.Now.TimeOfDay.ToString(@"hh\:mm"));
            Assert.AreEqual(graph.StatusMessage, expectedMessage, "Incorrect status message displayed");

Wednesday, May 30, 2012

Modeling Multi-Threaded Applications

So last time we looked a bit at stochastic modeling and I introduced the concept in Spec Explorer for modeling random actions/decisions. The example we looked at was a stochastic counter – which is just a tiny bit contrived. Actually, I can’t think of anywhere I would like to use a stochastic counter. Thus, I want to spent some time modeling a real stochastic example in this post.
A very common use of stochasticity is multi-threading/parallel processing. In a parallel system more than one thread can work on solving a problem at the same time (often running on multiple CPUs at the same time). So why does this have to lead to stochastic systems? – you may ask. The fundamental answer to this question is timing. In a parallel environment, to make full use of system resources, you poll a pool of threads for the next available worker thread. Whichever thread is available is determined by “chance”, because it is impacted by the speed of the executing CPUs, latency of external dependencies (RAM, BUS, network, disk, etc.) which we (usually) cannot control.
Consider a fictitious parallel environment with 3 worker threads:
This component is used by assigning the Job Scheduler a Job to be processed, then the scheduler will hold this job in its internal queue and await a worker thread to be available, once a thread is available it will take the incoming Job and start processing it in its own execution context (i.e. CPU, RAM, etc.). After successfully processing the Job the worker thread stores the result in an aggregated result list (which has to be thread-safe).
A very common sunshine scenario to test would be to give the Job Scheduler a single Job and check that a worker thread starts working on it and finally check the result from the aggregated result list.