In previous articles, I have introduced the idea of
abstractions for distributed computing. An abstraction is a way of specifying a large amount of work in a way that makes it possible to be distributed across a large computing system. All of the abstractions I have discussed so far have a compact, regular structure.
However, many people have large workloads that do not have a regular structure. They may have one program that generates three output files, each consumed by another program, and then joined back together. You can think of these workloads as a directed graph of processes and files, like the figure to the right.
If each of these programs may run for a long time, then you need a workflow engine that will keep track of the state of the graph, submit the jobs for execution, and deal with failures. There exist a number of workflow engines today, but without naming names, they aren't exactly easy to use. The workflow designer has to write a whole lot of batch scripts, or XML, or learn a rather complicated language to put it all together.
We recently wondered if there was a simpler way to accomplish this, A good workflow language should make it easy to do simple things, and at least obvious (if not easy) how to specify complex things. For implementation reasons, a workflow language needs to clearly state the data needs of an application: if we know in advance that program A needs file X, then we can deliver it efficiently before A begins executing. If possible, it shouldn't require the user to know anything about the underlying batch or grid systems.
After scratching our heads for a while, we finally came to the conclusion that good old Make is an attractive worfklow language. It is very compact, it states data dependencies nicely, and lots of people already know it. So, we have built a workflow system called Makeflow, which takes Makefiles, and runs them on parallel and distributed systems. Using Makeflow, you can take a very large workflow and run it on your local workstation, a single 32-core server, or a 1000-node Condor pool.
What makes Makeflow different from previous distributed makes is that is does not rely on a distributed file system. Instead, it uses the dependency information already present in the Makefile to send data to remote jobs. For example, if you have a rule like this:
output.data final.state : input.data mysim.exe
./mysim.exe -temp 325 input.data
then Makeflow will ensure that the input files input.data and mysim.exe are placed at the worker node before running mysim.exe. Afterwards, Makeflow brings the output files back to the initiator.
Because of this property, you don't need a data center in order to run a Makeflow. We provide a simple process called worker that you can run on your desktop, your laptop, or any other old computers you have lying around. The workers call home to Makeflow, which coordinates the execution on whatever machines you have available.
You can download and try out Makeflow yourself from the CCL web site.