Tuesday, June 27, 2017

Talk at ScienceCloud Workshop

Prof. Thain gave the opening talk, "Seamless Scientific Computing from Laptops to Clouds", at the ScienceCloud workshop preceding High Performance Distributed Computing 2017 in Washington, DC.  This talk gives an overview of the problem of migrating scientific codes from the comfortable environment of a laptop to the complex environment of a cluster or a cloud, highlighting our new tools for software deployment and resource management for bioinformatics and high energy physics applications.

Monday, May 22, 2017

Congrads to Ph.D Graduates

Congratulations to all of our 2017 Ph.D. graduates in Computer Science and Engineering,
and especially to Dr. Haiyan Meng who is moving on to a position at Google, Inc.

Wednesday, May 17, 2017

Announcement: CCTools 6.1.0. released

The Cooperative Computing Lab is pleased to announce the release of version 6.1.0 of the Cooperative Computing Tools including Parrot, Chirp, Makeflow, WorkQueue, Umbrella, Prune, SAND, All-Pairs, Weaver, and other software.

The software may be downloaded here:

This is a major which adds several features and bug fixes. Among them:

  • [General]  IPv4 and IPv6 mode handling fixes. (Tim Shaffer)
  • [Grow]  Fuse module for GROW-FS. (Tim Shaffer)
  • [Makeflow]  Updated manual. (Douglas Thain)
  • [Makeflow]  Support for Mesos. (Charles Zheng)
  • [Makeflow]  Support for Singularity. (Kyle Sweeney)
  • [Makeflow]  --shared-fs option. (Nick Hazekamp)
  • [Makeflow]  --preserve option for per rule caching. (Pierce Cunneen)
  • [Parrot]  Fix ld.so and exec bugs. (Tim Shaffer)
  • [Parrot]  Fix handling of namespace symlinks. (Tim Shaffer)
  • [Parrot]  Bind/connect on AF_UNIX sockets. (Tim Shaffer)
  • [Parrot]  Several bug fixes. (Tim Shaffer)
  • [Parrot]  parrot_namespace to use parrot inside parrot. (Tim Shaffer)
  • [Parrot]  Add fixed and warped PIDs. (Douglas Thain)
  • [Prune]     Several bug fixes. (Peter Ivie)
  • [ResourceMonitor]  Resource per task snapshots, --snapshot-file. (Ben Tovar)
  • [Umbrella]  Several bug fixes. (Haiyan Meng)
  • [WorkQueue]  Resource per task snapshots, q.enable_monitoring_snapshots. (Ben Tovar)
  • [WorkQueue]  Several bug fixes. (Ben Tovar)
  • [WorkQueue]  Custom environments for wq factory. (Kyle Sweeney)
  • [WorkQueue]  Several fixes to wq factory. (Nate Kremer-Herman)
  • [WQ_Maker]   Several bug fixes, update to latest version of maker. (Nick Hazekamp)

Thanks goes to the contributors for many features, bug fixes, and tests:

  • Jakob Blomer
  • Pierce Cunneen
  • Patrick Donnelly
  • Nathaniel Kremer-Herman
  • Nicholas Hazekamp
  • Peter Ivie
  • Haiyan Meng
  • Tim Shaffer
  • Douglas Thain
  • Ben Tovar
  • Kyle Sweeney
  • Chao Zheng

Please send any feedback to the CCTools discussion mailing list:



Friday, May 5, 2017

Makeflow and Mesos Paper at CCGrid 2017

Charles Zheng will present the paper Deploying High Throughput Scientific Workflows on Container Schedulers with Makeflow and Mesos at the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2017) in May 15, 2017 at Madrid, Spain. In this paper we consider how to launch workflow system on container schedulers with minimal performance loss and higher system efficiency. As examples of current technology, we use Makeflow,  Work Queue, Resource Monitor and Mesos. We observe that using Work Queue and Resource Monitor not only reduces the task turnaround time but also achieves higher resource usage rate. Following is the system architecture.

Monday, May 1, 2017

Workflow Reproducibility Paper at ICCS 2017

Haiyan Meng will present a paper titled Facilitating the Reproducibility of Scientific Workflows with Execution Environment Specifications at the International Conference on Computational Science (ICCS) 2017 this June at Zurich, Switzerland. This paper explores the challenges in reproducing scientific workflows, and proposes a framework for facilitating the reproducibility of scientific workflows at the task level by giving scientists complete control over the execution environments of the tasks in their workflows and integrating execution environment specifications into scientific workflow systems.

Wednesday, March 22, 2017

Ph.D. Defense: Haiyan Meng

Haiyan Meng successfully defended her dissertation titled "Improving the Reproducibility of Scientific Applications with Execution Environment Specifications"  Congratulations!

Thursday, March 9, 2017

Makeflow Examples Archive

We recently updated our archive of example Makeflows so that they are significantly easier to download, execute, and reshape to various sizes.   For each one, we have instructions on how to obtain the underlying binary program, generate some sample data, and then create a workload of arbitrary size.  This allows you to experiment with Makeflow at small scale, and then dial things up when you are ready run on on thousands of nodes:


Friday, February 17, 2017

Big CMS Data Analysis at Notre Dame

Analyzing the data produced by the Compact Muon Solenoid (CMS), one of the experiments at the Large Hadron Collider, requires a collaboration of physicists, computer scientists to harness hundreds of thousands of computers at universities and research labs around the world.  The contribution of each site to the global effort, whether small or large, is reported out on a regular basis.

This recent graph tells an interesting story about contributions to CMS computing in late 2016.  Each color in the bargraph represents the core-hours provided by a given site over the course of a week:

The various computing sites are divided into tiers:

  • Tier 0 is CERN, which is responsible for providing data to the lower tiers.
  • Tier 1 contains the national research labs like Fermi National Lab (FNAL), Rutherford Appleton Lab in in UK, and so forth, that facilitate analysis work for universities in their countries.
  • Tier 2 contains universities like Wisconsin, Purdue, and MIT, that have significant shared computing facilities dedicated to CMS data analysis.
  • Tier 3 is everyone else performing custom data analysis, sometimes on private clusters, and sometimes on borrowed resources. Most of those are so small that they are compressed into black at the bottom of the graph.

Now, you would think that the big national sites would produce most of the cycles, but there are a few interesting exceptions at the top of the list.

First, there are several big bursts in dark green that represent the contribution of the HEPCloud prototype, which is technically a Tier-3 operation, but is experimenting with consuming cycles from Google and Amazon.  This has been successful at big bursts of computation, and the next question is whether this will be cost-effective over the long term.

Next, the Tier-2 at the University of Wisconsin consistently produces a huge number of cycles from their dedicated facility and opportunistic resources from the Center for High Throughput Computing.  This group works closely with the HTCondor team at Wisconsin to make sure every cycle gets used, 365 days a year.

Following that, you have the big computing centers at CERN and FNAL, which is no surprise.

And, then the next contributor is our own little Tier-3 at Notre Dame, which frequently produces more cycles than most of the Tier-2s and some of the Tier-1s!  The CMS group at ND harnesses a small dedicated cluster, and then adds to that unused cycles from our campus Center for Research Computing by using Lobster and the CCL Work Queue software on top of HTCondor.

The upshot is, on a good day, a single grad student from Notre Dame can perform data analysis at a scale that rivals our national computing centers!

Thursday, February 2, 2017

IceCube Flies with Parrot and CVMFS

IceCube is a neutrino detector built at the South Pole by instrumenting about a cubic kilometer of ice with 5160 light sensors. The IceCube data is analyzed by a collaboration of about 300 scientists from 12 countries. Data analysis relies on the precise knowledge of detector characteristics, which are evaluated by vast amounts of Monte Carlo simulation.  On any given day, 1000-5000 jobs are continuously running.

Recently, the experiment began using Parrot to get their code running on GPU clusters at XSEDE sites (Comet, Bridges, and xStream) and the Open Science Grid.  IceCube relies on software distribution via CVMFS, but not all execution sites provide the necessary FUSE modules.  By using Parrot, jobs can attach to remote software repositories without requiring special privileges or kernel modules.

- Courtesy of Gonzalo Merino, University of Wisconsin - Madison

Tuesday, October 25, 2016

Reproducibility Papers at eScience 2016

CCL students presented two papers at the IEEE 12th International Conference on eScience on the theme of reproducibility in computational science:
Congrads to Haiyan for winning a Best of Conference award for her paper: