However, many people have large workloads that do not have a regular structure. They may have one program that generates three output files, each consumed by another program, and then joined back together. You can think of these workloads as a directed graph of processes and files, like the figure to the right.
If each of these programs may run for a long time, then you need a workflow engine that will keep track of the state of the graph, submit the jobs for execution, and deal with failures. There exist a number of workflow engines today, but without naming names, they aren't exactly easy to use. The workflow designer has to write a whole lot of batch scripts, or XML, or learn a rather complicated language to put it all together.
We recently wondered if there was a simpler way to accomplish this, A good workflow language should make it easy to do simple things, and at least obvious (if not easy) how to specify complex things. For implementation reasons, a workflow language needs to clearly state the data needs of an application: if we know in advance that program A needs file X, then we can deliver it efficiently before A begins executing. If possible, it shouldn't require the user to know anything about the underlying batch or grid systems.
After scratching our heads for a while, we finally came to the conclusion that good old Make is an attractive worfklow language. It is very compact, it states data dependencies nicely, and lots of people already know it. So, we have built a workflow system called Makeflow, which takes Makefiles, and runs them on parallel and distributed systems. Using Makeflow, you can take a very large workflow and run it on your local workstation, a single 32-core server, or a 1000-node Condor pool.
What makes Makeflow different from previous distributed makes is that is does not rely on a distributed file system. Instead, it uses the dependency information already present in the Makefile to send data to remote jobs. For example, if you have a rule like this:
output.data final.state : input.data mysim.exe
./mysim.exe -temp 325 input.data
then Makeflow will ensure that the input files input.data and mysim.exe are placed at the worker node before running mysim.exe. Afterwards, Makeflow brings the output files back to the initiator.
Because of this property, you don't need a data center in order to run a Makeflow. We provide a simple process called worker that you can run on your desktop, your laptop, or any other old computers you have lying around. The workers call home to Makeflow, which coordinates the execution on whatever machines you have available.
You can download and try out Makeflow yourself from the CCL web site.
Makeflow works with a number of remote 'worker' processes. Each worker calls home to the Makeflow master, which then sends the needed files directly over that connection to the worker. So, the master controls all of the data transfers, and you don't need a shared filesystem.
ReplyDeleteIf you know in advance that all the tasks will take about the same amount of time, then you can cancel slow tasks that fall far outside the distribution. (See Fail Fast, Fail Often.) However, for Make in general, you don't know in advance how long each will take, so you can't do much better than simply waiting for each task to succeed or fail on its own.
I'll have to try makeflow. I have an environment where there are 3 to 4 levels of dependency to process. The final object sits and produces live data. Environmental changes trigger a rebuild stage that is a DAG and I was planning to parallel make, run periodically, to manage the whole thing.
ReplyDeleteI was drawn to make for about the same reasons you state: it is well known and easy to program. I'd add to that: it's robust, stable, well-debugged software.
I was going over the source code for gmake, though, and there's a bit of cruft in there.
Ideally, I would live a make daemon, capable of reacting to changes in the file system and separating the three traditional stages of (1) constructing the dependency tree, (2) determining required update trees and (3) executing the resulting forest of updates. Ideally I would like all three stages as separate daemons, with the execution phase capable of mutex locking portions of the dependency tree until they are completed.