Monday, May 23, 2016

Containers, Workflows, and Reproducibility

The DASPOS project hosted a workshop on Container Strategies for Data and Software Preservation that Promote Open Science at Notre Dame on May 19-20, 2016.  We had a very interesting collection of researchers and practitioners, all working on problems related to reproducibility, but presenting different approaches and technologies.

Prof. Thain presented recent work by CCL grad students Haiyan Meng and Peter Ivie on Combining Containers and Workflow Systems for Reproducible Execution.

The Umbrella tool created by Haiyan Meng allows for a simple, compact, archival representation of a computation, taking into account hardware, operating system, software, and data dependencies.  This allows one to accurately perform computational experiments and give each one a DOI that can be shared, downloaded, and executed.

The PRUNE tool created by Peter Ivie allows one to construct dynamic workflows of connected tasks, each one precisely specified by execution environment.  Provenance and data are tracked precisely, so that the specification of a workflow (and its results) can be exported, imported, and shared with other people.  Think of it like git for workflows.

No comments:

Post a Comment