Modeling Multi-Threaded Applications

So last time we looked a bit at stochastic modeling and I introduced the concept in Spec Explorer for modeling random actions/decisions. The example we looked at was a stochastic counter – which is just a tiny bit contrived. Actually, I can’t think of anywhere I would like to use a stochastic counter. Thus, I want to spent some time modeling a real stochastic example in this post.

A very common use of stochasticity is multi-threading/parallel processing. In a parallel system more than one thread can work on solving a problem at the same time (often running on multiple CPUs at the same time). So why does this have to lead to stochastic systems? – you may ask. The fundamental answer to this question is timing. In a parallel environment, to make full use of system resources, you poll a pool of threads for the next available worker thread. Whichever thread is available is determined by “chance”, because it is impacted by the speed of the executing CPUs, latency of external dependencies (RAM, BUS, network, disk, etc.) which we (usually) cannot control.

Consider a fictitious parallel environment with 3 worker threads:

This component is used by assigning the Job Scheduler a Job to be processed, then the scheduler will hold this job in its internal queue and await a worker thread to be available, once a thread is available it will take the incoming Job and start processing it in its own execution context (i.e. CPU, RAM, etc.). After successfully processing the Job the worker thread stores the result in an aggregated result list (which has to be thread-safe).

A very common sunshine scenario to test would be to give the Job Scheduler a single Job and check that a worker thread starts working on it and finally check the result from the aggregated result list.

Synchronization in models of multi-threaded systems

Multi-threading is at first a tough concept to grasp, and even those skilled at developing parallel systems, frequently make mistakes. Minor problems magnify significantly in a multi-threaded environment, because they can lead to system deadlocks or sporadic behavior, which is exceptionally difficult to reproduce and debug. One common problem is to assume that a result is ready, while in fact it is still being process by a worker thread – this is lack of synchronization. For example, in the case of the parallel job scheduler, it is important that we wait for the worker threads to finish processing jobs before we check the results – this may at first seem simple, but actually it is counter-intuitive to be implementing synchronization actions in a model, and also we need to think about how we slice the model to produce only valid models. Consider a model comprising the following three rules:

[Rule(Action = "ScheduleJob()/result")]

[Rule(Action = "AwaitResults()")]

[Rule(Action = "GetResults()/result")]

The ScheduleJob is a stochastic rule that creates a job and schedules it on any worker thread. Furthermore, we need a synchronization action – here named AwaitResults – to ensure all worker threads have finished. Finally GetResults checks that the results are correct. A valid slice of this model would look like:

machine SlicedStochasticWorkerModel() : Main where ForExploration = true

{

(ScheduleJob{X} ; AwaitResults ; GetResults) || StochasticWorkerModel

}

This slice composes a model where we schedule X jobs, await results and finally check the results are correct. However, from a pure modeling perspective the sequence ( ScheduleJob ; AwaitResults ; ScheduleJob ; GetResults ) is perfectly valid, but in a multi-threaded environment this sequence behaves indeterministic, so we have to constrain our model using slicing.

Parallel job scheduler model

A simple model of this system with these three rules could look like (see reference [1] for the full sample code):

/// <summary>

/// Schedule a job in the system for processing

/// </summary>

/// <returns>The id of the selected worker thread.</returns>

[Rule(Action = "ScheduleJob()/result")]

static int ScheduleJobRule()

{

int thread = Choice.Some<Int32>(i => i >= 0 && i < 3);

runningJobs.Add(++nextJobId, thread);

return thread;

}

/// <summary>

/// Dummy action which causes the model to wait for the worker thread to finish

/// and deliver their results before checking them

/// </summary>

[Rule(Action = "AwaitResults()")]

static void AwaitResultsRule()

{

}

/// <summary>

/// Check the results from the worker threads

/// </summary>

/// <returns>A checksum of the results computed by the threads.</returns>

[Rule(Action = "GetResults()/result")]

static int GetResultsRule()

{

return runningJobs.Keys.Sum();

}

Where the finite state machine composes a map of which jobs are running on which worker threads. It’s interesting to notice that the synchronization rule is an empty stub – the model does not need to do anything to synchronize, because the model is not multi-threaded, only the system under test is. Synchronization takes place at the slicing level of the model.

Exploring this model with a limitation of scheduling a maximum of two jobs yields:

The top section of the model is where we schedule the two jobs, then we have a synchronization action and finally we check the results.

Parallel job scheduler implementation

The system under test is implemented using a standard .NET ThreadPool to process jobs in parallel.

public class StochasticJobScheduler

{

private static int runningJobs = 0;

public static int ScheduleJob()

{

runningJobs++;

ThreadPool.SetMaxThreads(3, 3);

ThreadPool.QueueUserWorkItem(WorkerThread.Process, runningJobs);

return 0;

}

public static void AwaitResults()

{

// Wait for all threads to complete

while (AggregatedResultList.Count() < runningJobs)

{

Thread.Sleep(100);

}

public static int GetResults()

{

return AggregatedResultList.GetResults().Sum();

}

The job processing is delegated to the WorkerThread.Process method. It is very simple to implement a job scheduler like this, but unfortunately we run into some problems while executing the tests. From the ScheduleJob implementation it is evident that we do not adhere to the requirements of the model – we always return zero telling the model the first worker thread picked up the job, but in fact we have no way of knowing that. If the results are independent of which thread processed them, then you can argue that it doesn’t matter that we ‘lie’ to the model about which thread got picked up – but then I would counter that argument by asking why we even model it stochastically. Even worse, in cases where the result actually does depend on which thread executes it, we get incorrect results. Observe the following change in the worker processing implementation:

AggregatedResultList.AddResult(Thread.CurrentThread.ManagedThreadId);

This will make the thread report its unique ID as the result of the procession and would cause our tests to fail sporadically because we are ‘lying’ to the model.

Okay, so let’s fix this properly, we can let the worker thread tell the scheduler the last used thread ID using proper synchronization – I’m not going to include the implementation details here, but I refer you to the sample code for how this is done. The result of this change is effectively that the StochasticJobScheduler.ScheduleJob method now returns:

Thread.CurrentThread.ManagedThreadId

Of the selected worker thread. Unfortunately, what we end up with is the following failure:

Expected matching return, found return ScheduleJob()/15. Diagnosis:

1. outputs do not match
comment: checking step 'return ScheduleJob/0'
assert failed: expected '0', actual '15' (return of ScheduleJob, state S1)

2. outputs do not match
comment: checking step 'return ScheduleJob/1'
assert failed: expected '1', actual '15' (return of ScheduleJob, state S1)

3. outputs do not match
comment: checking step 'return ScheduleJob/2'
assert failed: expected '2', actual '15' (return of ScheduleJob, state S1)

What this is telling us is that the ID of the picked worker thread was 15, which is not one of the expected outputs from the model (they are 0, 1 or 2). This is because we have a mismatch between the expected thread IDs from the model, and the actual IDs supplied by .NET.

In fact the value of 15 is arbitrary, .NET could return anything, as long as the value is unique per thread. What happens is that .NET is hiding the information about which threads were created first, because it is simply not relevant for the implementation of a parallel system using thread pools. In fact, exposing such information could often lead to poorly designed systems, because in a pool of threads the consuming system is supposed to treat all threads as being equal.

We can let the adapter layer translate the .NET IDs to an index between 0 and 2 using the following piece of code:

private static List<int> threads = new List<int>();

public static int ScheduleJob()

{

...

int i = GetLastThreadId();

if (!threads.Contains(i))

{

threads.Add(i);

}

return threads.IndexOf(i);

}

This will make sure we don’t ‘lie’ about which thread was picked, which allows Spec Explorer to detect the correct path taken in the stochastic model at runtime. However, this will still not solve the problem where the results depend on which thread picked up the job.

Conclusion

This leaves us with the final conclusion. For multi-threaded applications, modeling is sort of a double edged sword. At one side, the system should shield away the multi-threaded aspect and from a testing perspective, really we should focus on the overall system behavior, and not model the stochastic choices. On the other hand, we would still like to model the full-blown system, including the stochastic choices, because they tell us something about how well balanced the implementation is. For example, in the case of the parallel job queue, we really would like to make sure, that the work flow is evenly balanced between the threads. Unfortunately, this would require more information than Spec Explorer provides. Currently, Spec Explorer will generate one test case from our model, which models all available paths, but running this test will only provide you with a single sample of how the system reacted to the specific scenario, which does not tell you a lot about the behavior. Instead you need to run the test many times and see that the system is actually taking different branches through the model – which is actually really difficult to see, because Spec Explorer does not do a lot to provide you with meaningful results.

It would be nice if Spec Explorer could visualize which branch was picked by the model, such that you could make a visual inspection of the test path. Where you to repeat the test 1000 times without resetting this visualization, a heat-map of the model would start to emerge, which would be very useful for multi-threaded profiling.

References

[1] Parallel Scheduler Model

Applied Model Based Testing - Experiences & Examples

Wednesday, May 30, 2012

Modeling Multi-Threaded Applications

1 comment: