Monday, November 18, 2013

CCL Papers at Supercomputing 2013

Members of the CCL team presented several papers at Supercomputing 2013 in Denver, Colorado:



Friday, October 11, 2013

CCL Workshop October 10-11 at Notre Dame

Join us for the annual CCL workshop at the University of Notre Dame!

http://www.nd.edu/~ccl/workshop/2013

The workshop is an opportunity for beginners and experts alike to learn about the latest software release, get to know the CCL team, meet other people engaged in large scale scientific computing, and influence the direction of our research and development.

Thursday, October 10th
Afternoon: Introduction and Software Tutorials

Friday, October 11th
All Day: New Technology, Scientific Talks, and Discussion

There is no registration fee, however, space is limited, so please register in advance to reserve your place. We hope to see you there!




Wednesday, August 21, 2013

New Work Queue Paper at IEEE Cluster 2013

Michael Albrecht and Dinesh Rajan will present their latest work titled Making Work Queue Cluster Friendly for Data Intensive Scientific Applications.

In the original design of Work Queue, each worker was a sequential process that executed one task at a time. This paper describes the extension of Work Queue into two respects:

  • Workers can now run multiple tasks simultaneously, each sharing a local cache directory.
  • Workers can be combined into hierarchies, each headed by a foreman, which provides a common disk cache for each sub tree.

The effect of these two changes is to dramatically reduce the network footprint at the master process, and at each execution site. The resulting system is more 'friendly' to local clusters, and is capable of scaling to even greater sizes.

Monday, July 29, 2013

CCTools 4.0 Released

The Cooperative Computing Lab is pleased to announce the release of version 4.0 of the Cooperative Computing Tools, including Parrot, Chirp, Makeflow, WorkQueue, SAND, All-Pairs, and other software.

This is the first release of the 4.0 series, with some major changes:
  • To support new features on WorkQueue, backwards compatibility of master and workers pre-4.0 is broken. Specifically, workers from 4.0 cannot connect to masters pre-4.0, and masters from 4.0 will not accept connection from workers pre-4.0. The API did not change, thus unless you want to take advantage of new features, you should not need to modify your code.

  • All code related to WorkQueue has been consolidated to its own library. When linking work queue applications in C, you will need to use: -lwork_queue -ldttools rather than just -ldttools. If you are using the perl or python bindings, no change is necessary.

  • The auto-mode option -a for communicating with the catalog server is being deprecated. It is now implied when a master, or project name (-M, -N) is specified.

  • Most tools now support long options at the command line (e.g., --help).
Highlights:

  • [WorkQueue] Support for workers hierarchies, with a master-foremen-workers paradigm. [Michael Albrecht]
  • [WorkQueue] Multi-slot workers. A worker now is able to handle more than one task at a time. [Michael Albrecht]
  • [WorkQueue] Resource reports. A worker reports its resources (disk, memory, and cpu) to the master, and each task in the master can specify a minimum of such resources. [Michael Albrecht, DeVonte Applewhite]
  • [WorkQueue] Authentication between master and workers when using the catalog server [Douglas Thain].
  • [WorkQueue] Python bindings now include most C API. [Dinesh Rajan]
  • [WorkQueue] Several bug fixes and code reorganization. [Dinesh Rajan, Michael Albrecht]
  • [WorkQueue] Policies can be specified to work_queue_pool to submit workers on demand. [Li Yu]
  • [Makeflow] Support for task categories. A rule can be labeled with a category, and required computational resources (disk, memory, and cpu) can be specified per category. Makeflow then automatically communicates these requirements to work queue or condor. [Ben Tovar]
  • [Parrot/Chirp] Support for a search system call added. Search allows for finding files in a number of directories with a shell pattern. See parrot_search for more information. [Patrick Donnelly, Brenden Kokoszka]
  • [Parrot] Several bug fixes for cvmfs support. [Douglas Thain, Ben Tovar, Patrick Donnelly]
  • [Monitor] A resource monitor/watchdog for computational resources (e.g. disk, memory, cpu, and io) that can be used standalone, or automatically by Makeflow and Work Queue. [Ben Tovar]
  • [Monitor] A visualizer that builds a webpage to show the resources histograms from the reports of the resource monitor. [Casey Robinson]


Please refer to the doc/ directory in the distribution for the usage of this new features. You can download the software here:
Download




Thanks goes to the contributors for this release: Michael Albrecht, DeVonte Applewhite, Peter Bui, Patrick Donnelly, Brenden Kokoszka, Kyle Mulholland, Francesco Prelz, Dinesh Rajan, Casey Robinson, Peter Sempolinski, Douglas Thain, Ben Tovar, and Li Yu.


Saturday, June 1, 2013

Accelerating Protein Folding with Adaptive Weighted Ensemble and Work Queue

Computational protein folding has historically relied on long-running simulations of single molecules. Although many such simulations can run be at once, they are statistically likely to sample the same common configurations of the molecule, rather than exploring the many possible states it may have. To address this, a team of researchers from the University of Notre Dame and Stanford University designed a system that combined the Adaptive Weighted Ensemble technique to run thousands of short Gromacs and Protomol simulations in parallel with periodic resampling to explore the rich state space of a molecule. Using the Work Queue framework, these simulations were distributed across thousands of CPUs and GPUs drawn from the Notre Dame, Stanford, and commercial cloud providers. The resulting system effectively simulates the behavior of a protein at 500 ns/hour, covering a wide range of behavior in days rather than years.


- Jesus Izaguirre, University of Notre Dame and Eric Darve, Stanford University

Friday, March 22, 2013

Tutorial on Makeflow and Work Queue at CCGrid 2013


Dinesh Rajan will present a tutorial on Building Elastic Applications with Makeflow and Work Queue as part of CCGrid 2013 in Delft, the Netherlands on May 13th. Come join us and learn how to write applications that scale up to hundreds or thousands of nodes running on clusters, clouds, and grids.


Elastic Apps Paper at CCGrid 2013


Dinesh Rajan will present his paper
Case Studies in Designing Elastic Applications at the IEEE International Conference on Clusters, Clouds, and Grids (CCGrid) in Delft, the Netherlands. This work was done in collaboration with Andrew Thrasher and Scott Emrich from the Notre Dame Bioinformatics Lab, and Badi Abdul-Wahid and Jesus Izaguirre from the Laboratory for Computational Life Sciences.


The paper describes our experience in designing three different elastic applications -- E-MAKER, Elastic Replica Exchange, and Folding at Work -- that run on hundreds to thousands of cores using the Work Queue framework. The paper offers six guidelines for designing similar applications:


  1. Abolish shared writes.
  2. Keep your software close and your dependencies closer.
  3. Synchronize two, you make company; synchronize
    three, you make a crowd.
  4. Make tasks of a feather flock together.
  5. Seek simplicity, and gain power.
  6. Build a model before scaling new heights.



Thursday, March 21, 2013

Genome Assembly Paper in IEEE TPDS


A recent article in IEEE Transactions on Parallel and Distributed Computing describes our work in collaboration with the Notre Dame Bioinformatics Laboratory on SAND - The Scalable Assembler at Notre Dame.


In this article, we describe how to refactor the standard Celera genome assembly pipeline into a scalable computation that runs on thousands of distributed cores using the Work Queue. By explicitly handling the data dependencies between tasks, we are able to significantly improve runtime over Celera on a standard cluster. In addition this technique allows the user to break free of the shared filesystem and run on hundreds thousands of nodes drawn from clusters, clouds, and grids.



Monday, February 18, 2013

CCTools 3.7.0 Released!


The Cooperative Computing Lab is pleased to announce the release of version 3.7.0 of the Cooperative Computing Tools, including Parrot, Chirp, Makeflow, WorkQueue, SAND, All-Pairs, and other software.



The software may be downloaded here.



This is a minor release which adds numerous features and fixes several bugs:




  • [WorkQueue] It is now possible to specify chunks (pieces) of an input file to be used as input for worker tasks. [Dinesh Rajan]

  • [Chirp] File extended attributes are now supported. [Patrick Donnelly]

  • [Makeflow] New -i switch now outputs pre-execution analysis of Makeflow DAG. [Li Yu]

  • [WorkQueue/Makeflow] Support for submitting tasks to the PBS batch submission platform added. [Dinesh Rajan]

  • [Makeflow] makeflow_log_parser now ignores comments in Makeflow logs. [Andrew Thrasher]

  • [Catalog] New catalog_update which reports information to a catalog server. [Peter Bui, Dinesh Rajan]

  • [WorkQueue] Various minor tweaks made to the API. [Li Yu, Dinesh Rajan]

  • [Catalog/WorkQueue] Support added for querying workers and tasks at run-time. [Douglas Thain]

  • [WorkQueue] Many environment variables removed in favor of option manipulation API. [Li Yu]

  • [Makeflow] Deprecated -t option (capacity tolerance) removed.

  • [WorkQueue] -W (worker status) now has working_dir and current_time fields.

  • [WorkQueue] -T (task status) now reports working_dir, current_time, address_port, submit_to_queue_time, send_input_start_time, execute_cmd_start_time. [Li Yu]

  • [WorkQueue] -Q (queue status) now reports working_dir.

  • [Makeflow] Input file (dependency) renaming supported with new "->" operator. [Michael Albrecht, Ben Tovar]

  • [WorkQueue] work_queue_pool now supports a new -L option to specify a log file. [Li Yu]

  • [WorkQueue] Tasks are now killed using SIGKILL.

  • [WorkQueue] Protocol based keep-alives added to workers. [Dinesh Rajan]



Thanks goes to the contributors for many minor features and bug fixes:




  • Michael Albrecht

  • Peter Bui

  • Patrick Donnelly

  • Brian Du Sell

  • Kyle Mulholland

  • Dinesh Rajan

  • Douglas Thain

  • Andrew Thrasher

  • Ben Tovar

  • Li Yu



Please send any feedback to the CCTools discussion mailing list.


Monday, February 11, 2013

CCTools 3.6.2 Released!


The Cooperative Computing Lab is pleased to announce the release of version 3.6.2 of the Cooperative Computing Tools, including Parrot, Chirp, Makeflow, WorkQueue, SAND, All-Pairs, and other software.



This is a bug fix release of version 3.6.1. No new features were added.



The software may be downloaded here:
http://www.cse.nd.edu/~ccl/software/download



Changes:




  • [WorkQueue] Corrected memory errors leading to a SEGFAULT. [Li Yu]

  • [Makeflow] Properly interpret escape codes in Makeflow files: \n, \t, etc. [Brian Du Sell]

  • [Parrot] Watchdog now properly honors minimum wait time. [Li Yu]

  • [Parrot] Reports the logical executable name for /proc/self/exe instead of the physical name. [Douglas Thain]

  • [WorkQueue] Race conditions in signal handling for workers were corrected. Tasks now have a unique process group to properly kill all task children on abort. [Dinesh Rajan, Li Yu]

  • [WorkQueue] Corrected incorrect handling of -C option where worker would not use the same catalog server as work_queue_pool. [Li Yu]



Thanks goes to the contributors for this release: Patrick Donnelly, Brian Du Sell, Dinesh Rajan, Douglas Thain, and Li Yu.



Enjoy!


Tuesday, January 15, 2013

Teaching Distributed Computing with Work Queue

The undergraduate Programming Paradigms class at the University of Notre Dame introduces undergraduate students to a variety of parallel and distributed programming models. Work Queue is used as an example of large scale distributed computing. Using a solar system simulator developed in a previous assignment, students were tasked with splitting a trajectory of the planets' positions into individual frames, populating POVRay scene files, rendering the scenes in a distributed manner using Work Queue, and combining the frames into a movie using ImageMagick. Since the students had used Python extensively, they found it very easy to write a single Work Queue master using the Python bindings. Several of the students went above and beyond the requirements by adding textures to the planets and animating the movement of the camera. The students really enjoyed the assignment while learning about the advantages and pitfalls of distributed computing.

- Ronald J. Nowling and Jesus A. Izaguirre, University of Notre Dame

Tuesday, January 1, 2013

Scaling Up Comparative Genomics with Makeflow

The CoGe Comparative Genomics Portal provides on-the-fly genomic analysis and comparative tools for nearly 20,000 genomes from 15,000 organisms and has become more and more popular as genome sequence has become less expensive. The portal runs about 10,000 workflows a month and needed a robust solution for distributed computing of various workflows that range from simple to complex. Using Makeflow, the CoGe team is modularizing the workflows being run through CoGe, has early wins in delivering value to the system by easily monitoring/restarting workflows, and is now starting to work on distributing computation across multiple types of compute resources.

Eric Lyons, University of Arizona