Cooperative Computing Lab News: 2017

Monday, December 4, 2017

CCL on Chameleon Cloud with ACIC

As has been a tradition for several years, the CCL has had the opportunity to teach about the CCTools and distributed computing as part of Applied Cyberinfrastructure Concepts(ACIC) course at University of Arizona and taught by Dr. Nirav Merchant and Dr. Eric Lyons. Due to the number of features that have been added as of recently, this year primarily focused on Makeflow and how we use it on the Cloud and with containers. The topics we talk about were:

Thinking opportunistically
Overview of the Cooperative Computing Tools
Makeflow
Makeflow using Work Queue as a batch system
Makeflow using the Cloud as a batch system
Specifying and managing resources
Using containers in and on a Makeflow
Specifying Makeflow with JSON and JX

The major topic I wanted to focus on here is running Makeflow on the Cloud. For several months we have supported Makeflow submitting to Amazon EC2 directly, and there is an upcoming release that will incorporate support for the Amazon Batch system. For this class we also worked on deploying CCTools on Chameleon Cloud, which is a "configurable experimental environment for large-scale cloud research" as found https://www.chameleoncloud.org/. Chameleon Cloud is a great test bed for researchers to deploy cloud instances and utilizes the OpenStack KVM interface.

TPDS Paper: Storage Management in Makeflow

As the scale of workflows and their data grow, it becomes increasingly difficult to execute within the provide storage. This issue is only exacerbated when attempting to run multiple workflows at the same time, often sharing the resources. Up until recently, the user would often make a guess at the total size of the workflow and execute until failure, having to remove older experiment and data to accommodate the needed room; creating a time consuming cycle of run->fail->clean until the required space is achieved.

The initial solution, which is effective in many cases, is to turn on garbage collection. Garbage collection in Makeflow track a created file from creation until it is no longer needed, at which point it is deleted. This initial solution works well to limit the active footprint of the workflow. However, the user is still left in a situation where they are not aware of the space needed to execute.

To resolve this added an algorithm that will estimate the size of the workflow, and what this minimum size needed to execute said workflow would be. This is done by determining the different paths of execution and finding the resulting minimum path(s) through the workflow. This is most accurately done by estimating and labeling the files in the Makeflow:

.SIZE test.dat 1024K

Using this information Makeflow can statically analyze the workflow and tell you the minimum and maximum storage needed to execute. This information can then be coupled with a run-time storage manager and garbage collection to stay within a user specified limit. Instead of actively trying to schedule in an order to prevent going over the limit, nodes are submitted when there is enough space to permit the it to run and have space for all of its children. This allows for the more concurrency if the space allows. Below is an image that shows how this limit can be used to at different levels.

This first image shows a bioinformatics workflow running using the minimum required space. We can see several(10) peaks in the workflow. Each of these correspond to a larger set of intermediate files that can be removed later. In the naive case where we don't track storage these can all occur at the same time using more storage than may be available.

In this second case, we set a limit on storage that is higher than the minimum. We can see similar spikes, run-time manager is only scheduling as many branches as can be run under the limit.

To use this feature, which is now released and available in the main branch of CCTools here are the steps:

Label the files. A slight over-estimate will work as well as the exact number is not known ahead of time. The default size is 1G.
```
.SIZE file.name 5M
```

Find the estimated size of the Makeflow.

makeflow --storage-print storage-estimate.dat

Run Makeflow setting a mode and limit. The limit can be anywhere between the min and the max. Type 1 indicates a min tracking which holds below the limit set.
```
makeflow --storage-type 1 --storage-limit 10G
```

Monday, November 13, 2017

CCL at Supercomputing 2017

We are well represented at the annual Supercomputing conference this week:

Tim Shaffer is presenting "Taming Metadata Storms in Parallel Filesystems with MetaFS" at the Parallel Data Storage Workshop (PDSW). This paper describes a technique for accelerating metadata-intensive program loading workloads that often cause trouble in parallel filesystems.

Kyle Sweeney is presenting "Lightweight Container Integration into Workflow Systems: A Case Study with Singularity and Makeflow" at Workflows in Support of Large Scale Science (WORKS). This talk explains why integrating containers into workflows isn't as simple as it first appears, and describes a variety of approaches for assembling complete applications from containers and data.

Charles Zheng is presenting "Wharf: Sharing Docker Images across Hosts from a Distributed Filesystem" at the Monday poster session. This work in progress aims to accelerate the use of containers on HPC clusters by sharing images and metadata within a parallel filesystem.

Thursday, October 26, 2017

TPDS Paper: Job Sizing

When submitting jobs for execution to a computing facility, a user must make a critical decision: how many resources (such as cores, memory and disk) should be requested for each job?
Broadly speaking, if the initial job size selected is too small, it is more likely that the job will fail and be returned, thus wasting resources on a failed run that must be retried. On the other hand, if the initial job size selected is too large, the job will succeed on the first try, but waste resources that go unused inside the job's allocation. If the waste is large enough, throughput will be reduced because those resources could have been used to run another job.
If the resources consumed by a collection of jobs were known and constant, then the solution would be easy: run one job at a large size, measure its consumption, and then use that smaller measured size for the remainder of the jobs. However, experience shows that real jobs have non-trivial distributions. For example, the figure shows the histogram of memory consumption for a set of jobs in a high energy physics workflow run on an HTCondor batch system at the University of Notre Dame.

Note that the histogram shows large peaks at approximately 900MB and 1300MB, but there are small number of outliers both above and below those values.
What memory size should we select for this workload? If we pick 3.8GB RAM for all jobs, then every job will succeed, but then most jobs would end up wasting several GB of memory that could be used to run other jobs. On the other hand, we could try a two-step approach, in which each job is run with a smaller value, wait to see which ones succeed or fail, and those that fail are run with the maximum 3.8GB memory allocation.
But precisely what smaller value should be used for the first attempt? The dotted line, at around 1.32GB, turns out to maximize the throughput when running the workflow under this two-step policy. Allowing for %8 of the tasks to be retried, throughput increases 2.54 times, and resources wasted decreased %44.
In our recent paper A Job Sizing Strategy for High-Throughput Scientific Workflows we fully describe the two-step strategy described above. These developments have also been integrated to makeflow and work queue in CCTools. For makeflow, the rules need to be labeled with the optimization mode:



.MAKEFLOW CATEGORY myfirstcategory
.MAKEFLOW MODE MAX_THROUGHPUT

output_1: input_1
    cmdline input_1 -o output_1

output_2: input_2
    cmdline input_2 -o output_2


.MAKEFLOW CATEGORY myothercategory
.MAKEFLOW MODE MAX_THROUGHPUT

output_3: input_3
    cmdline input_3 -o output_3

output_4: input_4
    cmdline input_4 -o output_4

Also, makeflow needs to run with the resource monitor enabled, as:
makeflow --monitor=my_resource_summaries_dir (... other options ...)
Rules in the same category will be optimized together.
Similarly, for work queue:



q = WorkQueue(...)
q.enable_monitoring()

q.specify_category_mode('myfirstcategory', WORK_QUEUE_ALLOCATION_MODE_MAX_THROUGHPUT)

t = Task(...)
t.specify_category('myfirstcategory')

Additionally, we have made available a pure python implementation at:
https://github.com/cooperative-computing-lab/efficient-resource-allocations

Wednesday, October 18, 2017

Makeflow Feature: JX Representation

There are a number of neat new features in the latest versions of our software that I would like to highlight through some occasional blog posts. If these sound interesting, please give them a try and send us your feedback.

First, I would like to highlight recent work by Tim Shaffer on JX, a new encoding for Makeflow that makes it easier to express complex workflows programmatically.

For example, a traditional makeflow rule looks like this:

out.txt: in.txt calib.dat simulate.exe

simulate.exe -i in.txt -p 10 > out.txt

In the latest version of Makeflow, you can write the same rule in JSON like this:

{

"command" : "simulate.exe -i in.txt -p 10 > out.txt",

"inputs" : [ "in.txt", "calib.dat", "simulate.exe" ],

"outputs": [ "out.txt" ]

}

Now, just using JSON by itself doesn't give you a whole lot. However, we extended JSON with a few new features like list comprehensions, variables substitutions, and operators. This gives us a programmable way of generating a lot of rules easily.

For example, this represents 100 rules where the parameter varies from 0-99:

{

"command" : format("simulate.exe -i in.txt -p %d > out.%d.txt",param,param),

"inputs" : [ "in.txt", "calib.dat", "simulate.exe" ],

"outputs": [ format("out.%d.txt",param) ]

} for param in range(100)

For a more detailed example, see these example BWA workflows expressed in three different ways:

Thanks to Andrew Litteken for converting and testing many of our example workflows into the new format.

Monday, October 9, 2017

Announcement: CCTools 6.2.0 released

The Cooperative Computing Lab is pleased to announce the release of version 6.2.0 of the Cooperative Computing Tools including Parrot, Chirp, Makeflow, WorkQueue, Umbrella, Prune, SAND, All-Pairs, Weaver, and other software.

The software may be downloaded here:
http://ccl.cse.nd.edu/software/download

This is a major which adds several features and bug fixes. Among them:

[JX] A superset of JSON to dynamically describe workflows, see doc/jx.html. (Tim Shaffer)
[Makeflow] Support for Amazon EC2. (Kyle Sweeney, Douglas Thain)
[Makeflow] Singularity support bug fixes. (Kyle Sweeney)
[Parrot] Fix CVMFS initialization. (Tim Shaffer)
[Prune] Several bug fixes. (Peter Ivie)
[ResourceMonitor] Measurement snapshots by observing log files, --snapshot-events. (Ben Tovar)
[WorkQueue] Compressed updates to the catalog server. (Nick Hazekamp, Douglas Thain)
[WorkQueue] work_queue_factory uses computed maximum worker capacity of the master. (Nate Kremer-Herman)
[WorkQueue] Several bug fixes. (Nick Hazekamp, Ben Tovar)
[WQ_Maker] Several bug fixes. (Nick Hazekamp)

Thanks goes to the contributors for many features, bug fixes, and tests:

Jakob Blomer
Nathaniel Kremer-Herman
Nicholas Hazekamp
Peter Ivie
Tim Shaffer
Douglas Thain
Ben Tovar
Kyle Sweeney
Chao Zheng

Please send any feedback to the CCTools discussion mailing list:

http://ccl.cse.nd.edu/community/forum

Enjoy!

Wednesday, August 30, 2017

2017 DISC Summer REU Conclusion

This summer, we hosted 9 outstanding undergraduate students in our summer REU program in Data Intensive Scientific Computing (DISC). Our guests spent the summer working with faculty in labs across campus in fields such as astronomy, high energy physics, bioinformatics, data visualization, and distributed systems. And, they enjoyed some summer fun around South Bend.

Check out these short YouTube clips that explain each research project:

And here they are presenting at our summer research symposium:

If you would like to participate, please apply for the 2018 edition of the DISC REU program at Notre Dame.

Tuesday, August 29, 2017

Announcement: CCTools 6.1.6 released

The Cooperative Computing Lab is pleased to announce the release of version 6.1.6 of the Cooperative Computing Tools including Parrot, Chirp, Makeflow, WorkQueue, Umbrella, Prune, SAND, All-Pairs, Weaver, and other software.

The software may be downloaded here:
http://ccl.cse.nd.edu/software/download

This is a minor release which adds some bug fixes. Among them:

[General] Fix bug configuring perl paths. (Ben Tovar)
[General] Fix bug on JX inline querying strings. (Tim Shaffer)
[Makeflow] Fix bug when waiting on a local process. (Douglas Thain)
[Makeflow] Enforce local resources limits. (Douglas Thain)
[Makeflow] Save failed outputs to aid debugging. (Tim Shaffer)
[Makeflow] Fix bug when parsing of some command lines. (Ben Tovar)
[WorkQueue] Transactions log adds workers disconnection reason. (Ben Tovar)
[WorkQueue] Fix bug when checking version of gnuplot in work_queue_graph_log. (Ben Tovar)
[WQmaker] Support dynamic environments. (Nicholas Hazekamp)

Thanks goes to the contributors for many features, bug fixes, and tests:

Nathaniel Kremer-Herman
Nicholas Hazekamp
Peter Ivie
Tim Shaffer
Douglas Thain
Ben Tovar
Kyle Sweeney
Chao Zheng

Please send any feedback to the CCTools discussion mailing list:

http://ccl.cse.nd.edu/community/forum

Enjoy!

Tuesday, June 27, 2017

Talk at ScienceCloud Workshop

Prof. Thain gave the opening talk, "Seamless Scientific Computing from Laptops to Clouds", at the ScienceCloud workshop preceding High Performance Distributed Computing 2017 in Washington, DC. This talk gives an overview of the problem of migrating scientific codes from the comfortable environment of a laptop to the complex environment of a cluster or a cloud, highlighting our new tools for software deployment and resource management for bioinformatics and high energy physics applications.

Monday, May 22, 2017

Congrads to Ph.D Graduates

Congratulations to all of our 2017 Ph.D. graduates in Computer Science and Engineering,
and especially to Dr. Haiyan Meng who is moving on to a position at Google, Inc.

Wednesday, May 17, 2017

Announcement: CCTools 6.1.0. released

The Cooperative Computing Lab is pleased to announce the release of version 6.1.0 of the Cooperative Computing Tools including Parrot, Chirp, Makeflow, WorkQueue, Umbrella, Prune, SAND, All-Pairs, Weaver, and other software.

The software may be downloaded here:
http://ccl.cse.nd.edu/software/download

This is a major which adds several features and bug fixes. Among them:

[General] IPv4 and IPv6 mode handling fixes. (Tim Shaffer)
[Grow] Fuse module for GROW-FS. (Tim Shaffer)
[Makeflow] Updated manual. (Douglas Thain)
[Makeflow] Support for Mesos. (Charles Zheng)
[Makeflow] Support for Singularity. (Kyle Sweeney)
[Makeflow] --shared-fs option. (Nick Hazekamp)
[Makeflow] --preserve option for per rule caching. (Pierce Cunneen)
[Parrot] Fix ld.so and exec bugs. (Tim Shaffer)
[Parrot] Fix handling of namespace symlinks. (Tim Shaffer)
[Parrot] Bind/connect on AF_UNIX sockets. (Tim Shaffer)
[Parrot] Several bug fixes. (Tim Shaffer)
[Parrot] parrot_namespace to use parrot inside parrot. (Tim Shaffer)
[Parrot] Add fixed and warped PIDs. (Douglas Thain)
[Prune] Several bug fixes. (Peter Ivie)
[ResourceMonitor] Resource per task snapshots, --snapshot-file. (Ben Tovar)
[Umbrella] Several bug fixes. (Haiyan Meng)
[WorkQueue] Resource per task snapshots, q.enable_monitoring_snapshots. (Ben Tovar)
[WorkQueue] Several bug fixes. (Ben Tovar)
[WorkQueue] Custom environments for wq factory. (Kyle Sweeney)
[WorkQueue] Several fixes to wq factory. (Nate Kremer-Herman)
[WQ_Maker] Several bug fixes, update to latest version of maker. (Nick Hazekamp)

Thanks goes to the contributors for many features, bug fixes, and tests:

Jakob Blomer
Pierce Cunneen
Patrick Donnelly
Nathaniel Kremer-Herman
Nicholas Hazekamp
Peter Ivie
Haiyan Meng
Tim Shaffer
Douglas Thain
Ben Tovar
Kyle Sweeney
Chao Zheng

Please send any feedback to the CCTools discussion mailing list:

http://ccl.cse.nd.edu/community/forum

Enjoy!

Friday, May 5, 2017

Makeflow and Mesos Paper at CCGrid 2017

Charles Zheng will present the paper Deploying High Throughput Scientific Workflows on Container Schedulers with Makeflow and Mesos at the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2017) in May 15, 2017 at Madrid, Spain. In this paper we consider how to launch workflow system on container schedulers with minimal performance loss and higher system efficiency. As examples of current technology, we use Makeflow, Work Queue, Resource Monitor and Mesos. We observe that using Work Queue and Resource Monitor not only reduces the task turnaround time but also achieves higher resource usage rate. Following is the system architecture.

Monday, May 1, 2017

Workflow Reproducibility Paper at ICCS 2017

Haiyan Meng will present a paper titled Facilitating the Reproducibility of Scientific Workflows with Execution Environment Specifications at the International Conference on Computational Science (ICCS) 2017 this June at Zurich, Switzerland. This paper explores the challenges in reproducing scientific workflows, and proposes a framework for facilitating the reproducibility of scientific workflows at the task level by giving scientists complete control over the execution environments of the tasks in their workflows and integrating execution environment specifications into scientific workflow systems.

Wednesday, March 22, 2017

Ph.D. Defense: Haiyan Meng

Haiyan Meng successfully defended her dissertation titled "Improving the Reproducibility of Scientific Applications with Execution Environment Specifications" Congratulations!

Thursday, March 9, 2017

Makeflow Examples Archive

We recently updated our archive of example Makeflows so that they are significantly easier to download, execute, and reshape to various sizes. For each one, we have instructions on how to obtain the underlying binary program, generate some sample data, and then create a workload of arbitrary size. This allows you to experiment with Makeflow at small scale, and then dial things up when you are ready run on on thousands of nodes:

https://github.com/cooperative-computing-lab/makeflow-examples

Friday, February 17, 2017

Big CMS Data Analysis at Notre Dame

Analyzing the data produced by the Compact Muon Solenoid (CMS), one of the experiments at the Large Hadron Collider, requires a collaboration of physicists, computer scientists to harness hundreds of thousands of computers at universities and research labs around the world. The contribution of each site to the global effort, whether small or large, is reported out on a regular basis.

This recent graph tells an interesting story about contributions to CMS computing in late 2016. Each color in the bargraph represents the core-hours provided by a given site over the course of a week:

The various computing sites are divided into tiers:

Tier 0 is CERN, which is responsible for providing data to the lower tiers.
Tier 1 contains the national research labs like Fermi National Lab (FNAL), Rutherford Appleton Lab in in UK, and so forth, that facilitate analysis work for universities in their countries.
Tier 2 contains universities like Wisconsin, Purdue, and MIT, that have significant shared computing facilities dedicated to CMS data analysis.
Tier 3 is everyone else performing custom data analysis, sometimes on private clusters, and sometimes on borrowed resources. Most of those are so small that they are compressed into black at the bottom of the graph.

Now, you would think that the big national sites would produce most of the cycles, but there are a few interesting exceptions at the top of the list.

First, there are several big bursts in dark green that represent the contribution of the HEPCloud prototype, which is technically a Tier-3 operation, but is experimenting with consuming cycles from Google and Amazon. This has been successful at big bursts of computation, and the next question is whether this will be cost-effective over the long term.

Next, the Tier-2 at the University of Wisconsin consistently produces a huge number of cycles from their dedicated facility and opportunistic resources from the Center for High Throughput Computing. This group works closely with the HTCondor team at Wisconsin to make sure every cycle gets used, 365 days a year.

Following that, you have the big computing centers at CERN and FNAL, which is no surprise.

And, then the next contributor is our own little Tier-3 at Notre Dame, which frequently produces more cycles than most of the Tier-2s and some of the Tier-1s! The CMS group at ND harnesses a small dedicated cluster, and then adds to that unused cycles from our campus Center for Research Computing by using Lobster and the CCL Work Queue software on top of HTCondor.

The upshot is, on a good day, a single grad student from Notre Dame can perform data analysis at a scale that rivals our national computing centers!

Thursday, February 2, 2017

IceCube Flies with Parrot and CVMFS

IceCube is a neutrino detector built at the South Pole by instrumenting about a cubic kilometer of ice with 5160 light sensors. The IceCube data is analyzed by a collaboration of about 300 scientists from 12 countries. Data analysis relies on the precise knowledge of detector characteristics, which are evaluated by vast amounts of Monte Carlo simulation. On any given day, 1000-5000 jobs are continuously running.

Recently, the experiment began using Parrot to get their code running on GPU clusters at XSEDE sites (Comet, Bridges, and xStream) and the Open Science Grid. IceCube relies on software distribution via CVMFS, but not all execution sites provide the necessary FUSE modules. By using Parrot, jobs can attach to remote software repositories without requiring special privileges or kernel modules.

- Courtesy of Gonzalo Merino, University of Wisconsin - Madison