Wednesday, March 22, 2017

Ph.D. Defense: Haiyan Meng

Haiyan Meng successfully defended her dissertation titled "Improving the Reproducibility of Scientific Applications with Execution Environment Specifications"  Congratulations!


Thursday, March 9, 2017

Makeflow Examples Archive


We recently updated our archive of example Makeflows so that they are significantly easier to download, execute, and reshape to various sizes.   For each one, we have instructions on how to obtain the underlying binary program, generate some sample data, and then create a workload of arbitrary size.  This allows you to experiment with Makeflow at small scale, and then dial things up when you are ready run on on thousands of nodes:

https://github.com/cooperative-computing-lab/makeflow-examples


Friday, February 17, 2017

Big CMS Data Analysis at Notre Dame

Analyzing the data produced by the Compact Muon Solenoid (CMS), one of the experiments at the Large Hadron Collider, requires a collaboration of physicists, computer scientists to harness hundreds of thousands of computers at universities and research labs around the world.  The contribution of each site to the global effort, whether small or large, is reported out on a regular basis.

This recent graph tells an interesting story about contributions to CMS computing in late 2016.  Each color in the bargraph represents the core-hours provided by a given site over the course of a week:

The various computing sites are divided into tiers:

  • Tier 0 is CERN, which is responsible for providing data to the lower tiers.
  • Tier 1 contains the national research labs like Fermi National Lab (FNAL), Rutherford Appleton Lab in in UK, and so forth, that facilitate analysis work for universities in their countries.
  • Tier 2 contains universities like Wisconsin, Purdue, and MIT, that have significant shared computing facilities dedicated to CMS data analysis.
  • Tier 3 is everyone else performing custom data analysis, sometimes on private clusters, and sometimes on borrowed resources. Most of those are so small that they are compressed into black at the bottom of the graph.

Now, you would think that the big national sites would produce most of the cycles, but there are a few interesting exceptions at the top of the list.

First, there are several big bursts in dark green that represent the contribution of the HEPCloud prototype, which is technically a Tier-3 operation, but is experimenting with consuming cycles from Google and Amazon.  This has been successful at big bursts of computation, and the next question is whether this will be cost-effective over the long term.

Next, the Tier-2 at the University of Wisconsin consistently produces a huge number of cycles from their dedicated facility and opportunistic resources from the Center for High Throughput Computing.  This group works closely with the HTCondor team at Wisconsin to make sure every cycle gets used, 365 days a year.

Following that, you have the big computing centers at CERN and FNAL, which is no surprise.

And, then the next contributor is our own little Tier-3 at Notre Dame, which frequently produces more cycles than most of the Tier-2s and some of the Tier-1s!  The CMS group at ND harnesses a small dedicated cluster, and then adds to that unused cycles from our campus Center for Research Computing by using Lobster and the CCL Work Queue software on top of HTCondor.

The upshot is, on a good day, a single grad student from Notre Dame can perform data analysis at a scale that rivals our national computing centers!


Thursday, February 2, 2017

IceCube Flies with Parrot and CVMFS


IceCube is a neutrino detector built at the South Pole by instrumenting about a cubic kilometer of ice with 5160 light sensors. The IceCube data is analyzed by a collaboration of about 300 scientists from 12 countries. Data analysis relies on the precise knowledge of detector characteristics, which are evaluated by vast amounts of Monte Carlo simulation.  On any given day, 1000-5000 jobs are continuously running.

Recently, the experiment began using Parrot to get their code running on GPU clusters at XSEDE sites (Comet, Bridges, and xStream) and the Open Science Grid.  IceCube relies on software distribution via CVMFS, but not all execution sites provide the necessary FUSE modules.  By using Parrot, jobs can attach to remote software repositories without requiring special privileges or kernel modules.

- Courtesy of Gonzalo Merino, University of Wisconsin - Madison