This summer, we hosted 9 outstanding undergraduate students in our summer REU program in Data Intensive Scientific Computing (DISC). Our guests spent the summer working with faculty in labs across campus in fields such as astronomy, high energy physics, bioinformatics, data visualization, and distributed systems. And, they enjoyed some summer fun around South Bend.
Check out these short YouTube clips that explain each research project:
And here they are presenting at our summer research symposium:
If you would like to participate, please apply for the 2018 edition of the DISC REU program at Notre Dame.
The Cooperative Computing Lab is pleased to announce the release of version 6.1.6 of the Cooperative Computing Tools including Parrot, Chirp, Makeflow, WorkQueue, Umbrella, Prune, SAND, All-Pairs, Weaver, and other software.
Prof. Thain gave the opening talk, "Seamless Scientific Computing from Laptops to Clouds", at the ScienceCloud workshop preceding High Performance Distributed Computing 2017 in Washington, DC. This talk gives an overview of the problem of migrating scientific codes from the comfortable environment of a laptop to the complex environment of a cluster or a cloud, highlighting our new tools for software deployment and resource management for bioinformatics and high energy physics applications.
The Cooperative Computing Lab is pleased to announce the release of version 6.1.0 of the Cooperative Computing Tools including Parrot, Chirp, Makeflow, WorkQueue, Umbrella, Prune, SAND, All-Pairs, Weaver, and other software.
We recently updated our archive of example Makeflows so that they are significantly easier to download, execute, and reshape to various sizes. For each one, we have instructions on how to obtain the underlying binary program, generate some sample data, and then create a workload of arbitrary size. This allows you to experiment with Makeflow at small scale, and then dial things up when you are ready run on on thousands of nodes:
Analyzing the data produced by the Compact Muon Solenoid (CMS), one of the experiments at the Large Hadron Collider, requires a collaboration of physicists, computer scientists to harness hundreds of thousands of computers at universities and research labs around the world. The contribution of each site to the global effort, whether small or large, is reported out on a regular basis.
This recent graph tells an interesting story about contributions to CMS computing in late 2016. Each color in the bargraph represents the core-hours provided by a given site over the course of a week:
The various computing sites are divided into tiers:
Tier 0 is CERN, which is responsible for providing data to the lower tiers.
Tier 2 contains universities like Wisconsin, Purdue, and MIT, that have significant shared computing facilities dedicated to CMS data analysis.
Tier 3 is everyone else performing custom data analysis, sometimes on private clusters, and sometimes on borrowed resources. Most of those are so small that they are compressed into black at the bottom of the graph.
Now, you would think that the big national sites would produce most of the cycles, but there are a few interesting exceptions at the top of the list.
First, there are several big bursts in dark green that represent the contribution of the HEPCloud prototype, which is technically a Tier-3 operation, but is experimenting with consuming cycles from Google and Amazon. This has been successful at big bursts of computation, and the next question is whether this will be cost-effective over the long term.
Following that, you have the big computing centers at CERN and FNAL, which is no surprise.
And, then the next contributor is our own little Tier-3 at Notre Dame, which frequently produces more cycles than most of the Tier-2s and some of the Tier-1s! The CMS group at ND harnesses a small dedicated cluster, and then adds to that unused cycles from our campus Center for Research Computing by using Lobster and the CCL Work Queue software on top of HTCondor.
The upshot is, on a good day, a single grad student from Notre Dame can perform data analysis at a scale that rivals our national computing centers!