Thursday, October 26, 2017

TPDS Paper: Job Sizing

When submitting jobs for execution to a computing facility, a user must make a critical decision: how many resources (such as cores, memory and disk) should be requested for each job?
Broadly speaking, if the initial job size selected is too small, it is more likely that the job will fail and be returned, thus wasting resources on a failed run that must be retried. On the other hand, if the initial job size selected is too large, the job will succeed on the first try, but waste resources that go unused inside the job's allocation. If the waste is large enough, throughput will be reduced because those resources could have been used to run another job.
If the resources consumed by a collection of jobs were known and constant, then the solution would be easy: run one job at a large size, measure its consumption, and then use that smaller measured size for the remainder of the jobs. However, experience shows that real jobs have non-trivial distributions. For example, the figure shows the histogram of memory consumption for a set of jobs in a high energy physics workflow run on an HTCondor batch system at the University of Notre Dame.
Note that the histogram shows large peaks at approximately 900MB and 1300MB, but there are small number of outliers both above and below those values.
What memory size should we select for this workload? If we pick 3.8GB RAM for all jobs, then every job will succeed, but then most jobs would end up wasting several GB of memory that could be used to run other jobs. On the other hand, we could try a two-step approach, in which each job is run with a smaller value, wait to see which ones succeed or fail, and those that fail are run with the maximum 3.8GB memory allocation.
But precisely what smaller value should be used for the first attempt? The dotted line, at around 1.32GB, turns out to maximize the throughput when running the workflow under this two-step policy. Allowing for %8 of the tasks to be retried, throughput increases 2.54 times, and resources wasted decreased %44.
In our recent paper A Job Sizing Strategy for High-Throughput Scientific Workflows we fully describe the two-step strategy described above. These developments have also been integrated to makeflow and work queue in CCTools. For makeflow, the rules need to be labeled with the optimization mode:

.MAKEFLOW CATEGORY myfirstcategory

output_1: input_1
    cmdline input_1 -o output_1

output_2: input_2
    cmdline input_2 -o output_2

.MAKEFLOW CATEGORY myothercategory

output_3: input_3
    cmdline input_3 -o output_3

output_4: input_4
    cmdline input_4 -o output_4

Also, makeflow needs to run with the resource monitor enabled, as:
makeflow --monitor=my_resource_summaries_dir (... other options ...)
Rules in the same category will be optimized together.
Similarly, for work queue:

q = WorkQueue(...)

q.specify_category_mode('myfirstcategory', WORK_QUEUE_ALLOCATION_MODE_MAX_THROUGHPUT)

t = Task(...)

Additionally, we have made available a pure python implementation at:

Wednesday, October 18, 2017

Makeflow Feature: JX Representation

There are a number of neat new features in the latest versions of our software that I would like to highlight through some occasional blog posts.  If these sound interesting, please give them a try and send us your feedback.

First, I would like to highlight recent work by Tim Shaffer on JX, a new encoding for Makeflow that makes it easier to express complex workflows programmatically.  

For example, a traditional makeflow rule looks like this:

out.txt: in.txt calib.dat simulate.exe
    simulate.exe -i in.txt -p 10 > out.txt

In the latest version of Makeflow, you can write the same rule in JSON like this:

    "command" : "simulate.exe -i in.txt -p 10 > out.txt",
    "inputs" : [ "in.txt", "calib.dat", "simulate.exe" ],
    "outputs": [ "out.txt" ]

Now, just using JSON by itself doesn't give you a whole lot.  However, we extended JSON with a few new features like list comprehensions, variables substitutions, and operators.  This gives us a programmable way of generating a lot of rules easily.

For example, this represents 100 rules where the parameter varies from 0-99:

   "command" : format("simulate.exe -i in.txt -p %d > out.%d.txt",param,param),
   "inputs" : [ "in.txt", "calib.dat", "simulate.exe" ],
   "outputs": [ format("out.%d.txt",param) ]

} for param in range(100)

For a more detailed example, see these example BWA workflows expressed in three different ways:
Thanks to Andrew Litteken for converting and testing many of our example workflows into the new format.

Monday, October 9, 2017

Announcement: CCTools 6.2.0 released

The Cooperative Computing Lab is pleased to announce the release of version 6.2.0 of the Cooperative Computing Tools including Parrot, Chirp, Makeflow, WorkQueue, Umbrella, Prune, SAND, All-Pairs, Weaver, and other software.

The software may be downloaded here:

This is a major which adds several features and bug fixes. Among them:

  • [JX] A superset of JSON to dynamically describe workflows, see doc/jx.html. (Tim Shaffer)
  • [Makeflow] Support for Amazon EC2. (Kyle Sweeney, Douglas Thain)
  • [Makeflow]  Singularity support bug fixes. (Kyle Sweeney)
  • [Parrot] Fix CVMFS initialization. (Tim Shaffer)
  • [Prune] Several bug fixes. (Peter Ivie)
  • [ResourceMonitor] Measurement snapshots by observing log files, --snapshot-events. (Ben Tovar)
  • [WorkQueue] Compressed updates to the catalog server. (Nick Hazekamp, Douglas Thain)
  • [WorkQueue] work_queue_factory uses computed maximum worker capacity of the master. (Nate Kremer-Herman)
  • [WorkQueue] Several bug fixes. (Nick Hazekamp, Ben Tovar)
  • [WQ_Maker] Several bug fixes. (Nick Hazekamp)

Thanks goes to the contributors for many features, bug fixes, and tests:

  • Jakob Blomer
  • Nathaniel Kremer-Herman
  • Nicholas Hazekamp
  • Peter Ivie
  • Tim Shaffer
  • Douglas Thain
  • Ben Tovar
  • Kyle Sweeney
  • Chao Zheng

Please send any feedback to the CCTools discussion mailing list: