Tuesday, October 2, 2018

Work Queue Visual Status

Check out the new Work Queue Status page by Nate Kremer-Herman.  This reveals a whole lot of information that was already reported to the global catalog in raw JSON, but was previously hard to interpret.  For any WQ application reporting itself to the global catalog (use the -N option)  you get a nice display of workers and tasks running and the total resources consumed across the application:

What's more, a pie chart shows a breakdown of the master is spending its time: sending data to workers, receiving data from workers, and polling (waiting) for workers to report are the main categories.  This tells you at a glance what the bottleneck of the system is.

This WQ master is spending most of its time sending data out to workers, so it's close to the limit of its scalability:
However, this one is spending most of its time polling for results, and only a small fraction sending.  It can likely handle many more workers:

This one is spending *all* of its time either receiving data from workers (completed tasks) or sending data to workers for new tasks.  It is completely occupied:
 


Wednesday, August 29, 2018

Announcement: CCTools 7.0.4 released

The Cooperative Computing Lab is pleased to announce the release of version 7.0.4 of the Cooperative Computing Tools including Parrot, Chirp, JX, Makeflow, WorkQueue, Umbrella, Prune, SAND, All-Pairs, Weaver, and other software.

The software may be downloaded here:
http://ccl.cse.nd.edu/software/download

This is a minor release with some bug fixes. Among them:

  • [General] --without-static-libgcc flag to configure script for compilation on Stampede2. (Ben Tovar)
  • [WorkQueue] Consider workers across different factories. (Bo Marchman)
  • [WorkQueue] Input files from tasks that exhausted resources where not being removed from the worker. (Ben Tovar)
  • [Makeflow] Communicate cores, memory, and disk resources to SLURM, SGE, and Torque (Nick Hazekamp)
  • [ResourceMonitor] Fix bug when computing maximum cores. (Ben Tovar)
  • [JX] Improved parsing errors and documentation. (Tim Shaffer, Douglas Thain, Ben Tovar)

Thanks goes to the contributors for many features, bug fixes, and tests:

  • Nathaniel Kremer-Herman
  • Nicholas Hazekamp
  • Bo Marchman
  • Tim Shaffer
  • Douglas Thain
  • Ben Tovar
  • Kyle Sweeney
  • Chao Zheng

Please send any feedback to the CCTools discussion mailing list:

http://ccl.cse.nd.edu/community/forum

Enjoy!

Monday, August 20, 2018

DISC REU Videos 2018

Our summer REU students in the DISC program produced an impressive set of videos describing their summer research projects -- check out the playlist!


Wednesday, July 25, 2018

VC3 - Virtual Clusters at PEARC 2018


The VC3 project (virtualclusters.org) allows end users to dynamically create virtual clusters with custom software and middleware, running on top of existing national computing facilities.  Using only standard login access to computing facilities, you can deploy a computing environment specialized for a complex application, and share it with your collaborators.

Today, Prof Thain is presenting the VC3 project at the PEARC 2018 conference, on behalf of the entire VC3 team:


Check out the service online at virtualclusters.org:


We are currently recruiting new users for beta testing, please sign up here:

Tuesday, July 24, 2018

Reproducibility in Scientific Computing

"Reproducibility in Scientific Computing", an article by (recent grad) Peter Ivie recently appeared in the journal ACM Computing Surveys.  This article gives a high level overview of the technical challenges inherent in making scientific computations reproducible, along with a survey of approaches to attacking these problems. Check it out!


Tuesday, July 10, 2018

Halfway Through 2018 Summer REU


We are a little more than halfway through the 2018 edition of our summer Data Intensive Scientific Computing REU program at the University of Notre Dame.  This summer, our students are working on projects in network science, genome analysis, workflow systems, data visualization, and more.  At this point in the summer, students are finalizing their results and starting to work on posters and videos to present what they have learned for our summer research symposium.

Thursday, July 5, 2018

Announcement: CCTools 7.0.0 released

The Cooperative Computing Lab is pleased to announce the release of version 7.0.0 of the Cooperative Computing Tools including Parrot, Chirp, JX, Makeflow, WorkQueue, Umbrella, Prune, SAND, All-Pairs, Weaver, and other software.

The software may be downloaded here:
http://ccl.cse.nd.edu/software/download

This is a major release which adds several features and bug fixes. Among them:

  • [General] Catalog updates compressed, and via TCP. (Douglas Thain, Nick Hazekamp, Ben Tovar)
  • [JX] Bug fixes to JX, a superset of JSON to dynamically describe workflows, see doc/jx-tutorial.html. (Tim Shaffer, Douglas Thain)
  • [Makeflow] Formally define and implement hooks to workflow rules. Hooks may be used to wrap rules with containers (e.g. singularity), a monitoring tool, etc.  (Nick Hazekamp, Tim Shaffer)
  • [Makeflow] Rule execution as Amazon Lambda functions and S3 objects. (Kyle Sweeney, Douglas Thain)
  • [Makeflow] Efficient shared file system access. (Nick Hazekamp)
  • [Makeflow] Several bug fixes for rules executing in Mesos. (Chao Zheng)
  • [ResourceMonitor] Several bug fixes. (Ben Tovar)
  • [WorkQueue] Add user-defined features to workers and tasks. (Nate Kremer-Herman)
  • [WorkQueue] Fixes for python3 support. (Ben Tovar)

Thanks goes to the contributors for many features, bug fixes, and tests:

  • Nathaniel Kremer-Herman
  • Nicholas Hazekamp
  • Tim Shaffer
  • Douglas Thain
  • Ben Tovar
  • Kyle Sweeney
  • Chao Zheng

Please send any feedback to the CCTools discussion mailing list:

http://ccl.cse.nd.edu/community/forum

Enjoy!

Monday, June 11, 2018

Papers at ScienceCloud Workshop

CCL grad student Kyle Sweeney is presenting two papers at the ScienceCloud/IWAC workshop at HPDC 2018.


Early Experience Using Amazon Batch for Scientific Workflows gives some of practical experience using Amazon Batch for scientific workflows, comparing the performance of straight EC2 virtual machines against Amazon Batch and overlaying the WorkQueue system on top of virtual machines.





Efficient Integration of Containers into Scientific Workflows explores different methods of composing container images into complex workflows, in order to make efficient use of shared filesystems and data movement.




Monday, June 4, 2018

CCL Internships at CERN and Alibaba

Two of our CCL grad students are off to internships this summer:

Nick Hazekamp will be in Geneva at CERN working with Jakob Blomer and the CVMFS group to develop methods for migrating high throughput workloads to HPC centers, while still making use of the global CVMFS filesystem.

Charles Zheng will be at Alibaba working on high throughput workloads running on container orchestration systems.


Tuesday, May 29, 2018

2018 DISC REU Kickoff

Our 2018 summer program in Data Intensive Scientific Computing (DISC) is underway at the University of Notre Dame.  Eleven students from all around the country are spending the summer working on challenging computing problems in fields such as high energy physics, neuroscience, epidemiology, species distribution, network science, high performance computing, and more.  Welcome to ND!


Friday, May 25, 2018

VC3 Project Limited Beta Opens

Ben Tovar gave a talk introducing the VC3 (Virtual Clusters for Community Computation) project at the annual HTCondor Week conference.

VC3 makes it easy for science groups to deploy custom software stacks across existing university clusters and national facilities.  For example, if you want to run your own private Condor pool across three clusters and share it with your collaborators, then VC3 is for you. 

We are now running VC3 as a "limited beta" for early adopters who would like to give it a try and send us feedback.  Check out the instructions and invitation to sign up.


Tuesday, May 22, 2018

Graduation 2018


It was a busy graduation weekend here at Notre Dame! The CSE department graduated nineteen PhD students, including CCL grads Dr. Peter Ivie and Dr. James Sweet.  Prof. Thain gave the graduation address at the CSE department ceremony. Congratulations and good luck to everyone!





Wednesday, April 25, 2018

VC3-Builder and WQ-MAKER at IC2E 2018

Ben Tovar presented the paper Automatic Dependency Management for Scientific Applications on Clusters and Nick Hazekamp presented the paper MAKER as a Service: Moving HPC applications to Jetstream Cloud at the IEEE International Conference on Cloud Engineering (IC2E 2018) on April 18, 2018 in Orlando, Florida.

In Automatic Dependency Management for Scientific workflows (paper slides) we introduce a tool for sofware environments deployments in clusters. This tool, called the vc3-builder, has minimal dependencies and a lightbootsrap, which allows it to be deployed along batch jobs. The vc3-builder then install any missing sofware using only user-priviliges (e.g., no sudo) so that the actual user payload can be executed. The vc3-builder is being developed as part of the DOE funded Virtual Clusters for Community Computation (VC3) project, in which users can construct custom short-lived virtual clusters across different computational sites.

In MAKER as a Service: Moving HPC applications to Jetstream Cloud (paper poster slides) we discussed the lessons learn in migrating MAKER, a traditional HPC application, to the cloud. This focused on issues like recreating the software stack using VC3-Bulder, addressing the lack of shared filesystems and inter-node communications with Work Queue, and building the application focused on user feedback allowing for informed decisions in the cloud. Using WQ-MAKER we were able to run MAKER not only on Jetstream, but also resources from Notre Dame's Condor cluster. Below you can see the systems architecture.

Monday, March 12, 2018

CCL at CyVerse Container Camp

Nick Hazekamp and Kyle Sweeney gave a talk, "Distributed Computing with Makeflow and Work Queue", at the CyVerse Container Camp workshop. This talk gives an overview of Makeflow and Work Queue, with an emphasis on how we approach using containers in a workflow. Highlights of the talk focused on different approaches to applying containers, either per-task or workflow-level, and what that looks like in practice on Jetstream, an Openstack based cloud platform.

You can find the slides and tutorial at: CCL at CyVerse Container Camp

Here you can see the active participants:

Monday, January 15, 2018

Submit Your CCL Highlight


We like to hear how you use our software!

If you have discovered a new result, published a paper, given a talk, or just done something cool, please take a few minutes to tell us about it on this simple form.

If we accept your submission, we will highlight your story on our website, include a mention in our annual report to the NSF, and send you some neat CCL stickers and swag as a little thank-you.

You can see what others have submitted on our community highlights page.