Our colleagues Henri Casanova (U Hawaii) and Rafael Ferreira da Silva (USC), along with their students, have recently published a paper highlighting their work in the WRENCH project. The have constructed a series of simulators have model the behavior of distributed systems, for the purposes of both performance prediction and education.
In their paper "Developing accurate and scalable simulators of production workflow management systems with WRENCH" the describe simulators that correspond the the Pegasus workflow management system and our own Work Queue distributed execution framework.
Of course, any simulation is an imperfect approximation of a real system, but what's interesting about the WRENCH simulations is that they allow us to verify the basic assumptions and behavior of a software implementation. In this example, the real system and the simulation show the same overall behavior, except that the real system has a stair-step behavior:
So, does that mean the simulation is "wrong"? Not really! In this case, the software is showing an undesirable behavior that is due either to incorrect logging or possibly a convoy effect. In short, the simulation helps us to find a bug relative to the "ideal" design. Nice!