Cooperative Computing Lab News: 2015

Monday, November 23, 2015

CCTools 5.3.0 released

The Cooperative Computing Lab is pleased to announce the release of version 5.3.0 of the Cooperative Computing Tools including Parrot, Chirp, Makeflow, WorkQueue, Umbrella, SAND, All-Pairs, Weaver, and other software.

The software may be downloaded here:
http://www.cse.nd.edu/~ccl/software/download

This minor release adds several features and bug fixes. Among them:

[Makeflow] Several enhancements in garbage collection. (Nick Hazekamp)
[Makeflow] Better task state handling when recovering execution log. (Nick Hazekamp)
[Parrot] Correct handling of multi-threaded programs. (Patrick Donnelly)
[Parrot] Adds parrot_mount, to set arbitrary mount points while parrot is executing. (Douglas Thain)
[Parrot] Add --fake-setuid option, for executables that request setuid. (Tim Shaffer)
[Parrot] Update cvmfs uri to new convention. (Jakob Blomer)
[Parrot] Add --whitelist to restrict filesystem access. (Tim Shaffer)
[Parrot] Make special file descriptors invisible to tracee. (Patrick Donnelly)
[Resource Monitor] Compute approximations of shared resident memory. (Ben Tovar)
[Resource Monitor] Remove resource_monitorv for static binaries. resource_monitor now handles all cases. (Ben Tovar)
[Resource Monitor] Working directories are not tracked by default anymore. Use --follow-chdir instead. (Ben Tovar)
[Umbrella] Support for curateND. (Haiyan Meng)
[Umbrella] Support for installing software from package managers. (Haiyan Meng)
[Work Queue] Adds option -C to read a JSON configuration file. (Ben Tovar)
[Work Queue] Several bug fixes regarding task/workflow statistics. (Ben Tovar)
[Work Queue] Master's shutdown option now correctly terminates workers. (Ben Tovar)
[Work Queue] Adds --sge-paremeter to ./configure script to personalize the sge_submit_workers script. (Ben Tovar)
[Work Queue] Adds the executable disk_allocator, to restrict disk usage at the workers. (Nate Kremer-Herman)

Incompatibility warnings:

Workers from this release do not work correctly with masters from previous releases.

Thanks goes to the contributors for many features, bug fixes, and tests:

Jakob Blomer
Neil Butcher
Patrick Donnelly
Nathaniel Kremer-Herman
Nicholas Hazekamp
Peter Ivie
Kevin Lannon
Haiyan Meng
Tim Shaffer
Douglas Thain
Ben Tovar
Mathias Wolf
Anna Woodard

Please send any feedback to the CCTools discussion mailing list:

http://ccl.cse.nd.edu/software/help/

Enjoy!

Monday, November 16, 2015

Analyzing LHC Data on 10K Cores with Lobster

Prof. Thain gave a talk titled Analyzing LHC Data on 10K Cores with Lobster at the Workshop on Data Intensive Computing in the Clouds at Supercomputing 2015. The talk gave an overview of our collaboration with members of the CMS experiment at Notre Dame. Together, we have built a data analysis system which can deploy the complex CMS computing environment on large clusters of non-dedicated machines.

Monday, November 9, 2015

Global Filesystems Paper in IEEE CiSE

Our latest paper, in collaboration with Jakob Blomer and the CVMFS team at CERN, describes the evolution of global-scale filesystems to serve the needs of the world-wide LHC experiment collaborations:

Jakob Blomer, Predrag Buncic, Rene Meusel, Gerardo Ganis, Igor Sfiligoi and Douglas Thain, The Evolution of Global Scale Filesystems for Scientific Software Distribution, IEEE/AIP Computing in Science and Engineering, 17(6), pages 61-71, December, 2015. DOI: 10.1109/MCSE.2015.111

Delivering complex software across a worldwide distributed system is a major challenge in high-throughput scientific computing. The problem arises at different scales for many scientific communities that use grids, clouds, and distributed clusters to satisfy their computing needs. For high-energy physics (HEP) collaborations dealing with large amounts of data that rely on hundreds of thousands of cores spread around the world for data processing, the challenge is particularly acute. To serve the needs of the HEP community, several iterations were made to create a scalable, user-level filesystem that delivers software worldwide on a daily basis. The implementation was designed in 2006 to serve the needs of one experiment running on thousands of machines. Since that time, this idea evolved into a new production global-scale filesystem serving the needs of multiple science communities on hundreds of thousands of machines around the world.

Tuesday, November 3, 2015

Preservation Talk at iPres 2015

Prof. Thain gave a talk titled "Preserving Scientific Software Executions: Preserve the Mess or Encourage Cleanliness" at the 2015 Conference on Digital Preservation. This talk gives a high level overview of our work on preservation, encompassing packaging with Parrot, environment specification with Umbrella, and workflow preservation with Prune.

Tuesday, October 20, 2015

CMS Case Study Paper at CHEP

Our case study work on how to preserve and reproduce a high energy physics (HEP) application with Parrot has been accepted by Journal of Physics: Conference Series (JPCS 2015).

The HEP application under investigation is called TauRoast, and authored by our physics collaborator, Matthias. TauRoast is a complex single-machine application having lots of implicit and explicit dependencies: CVS, github, PyYAML websites, personal websites, CVMFS, AFS, HDFS, NFS, and PanFS. The total size of these dependencies is about 166.8TB.

To make TauRoast reproducible, we propose one fine-grained dependency management toolkit based on Parrot to track the really used data and create a reduced package which gets rid of all the unused data. By doing so, the original execution environment with the size of 166.8TB is reduced into a package with the size of 21GB. The correctness of the preserved package is demonstrated in three different environments - the original machine, one virtual machine from the Notre Dame Cloud Platform and one virtual machine from the Amazon EC2 Platform.

Haiyan Meng, Matthias Wolf, Peter Ivie, Anna Woodard, Michael Hildreth and Douglas Thain,
A Case Study in Preserving a High Energy Physics Application with Parrot,
Journal of Physics: Conference Series, December, 2015.

Monday, October 19, 2015

OpenMalaria Preservation with Umbrella

Haiyan worked together with Alexander from CRC, successfully preserved and reproduced a C++ application, openMalaria, using Umbrella. The data dependencies of openMalaria include packages from yum repositories, OS images from the CCL websites, software and data dependencies from curateND. Through a JSON-format specification, Umbrella allows the user to specify the complete execution environment for his application: hardware, kernel, OS, software, data, packages supported by package managers, environment variables, and command. Each dependency in an Umbrella specification also comes with its metadata information - size, format, checksum, downloading urls, and mountpoints. During runtime, Umbrella tries to construct the execution environment specified in the specification with the help of different sandboxing techniques such as Parrot and Docker, and run the user's task.

Tuesday, October 13, 2015

DAGVz Paper at Visual Performance Analysis Workshop

An Huynh will be presenting a paper on the visualization of task-parallel programs at the Visual Performance Analysis workshop at Supercomputing 2015. (He is a student of Kenjiro Taura at the University of Tokyo who spent a semester working with us in the Cooperative Computing Lab.)

An developed DAGViz, a tool for exploring the rich structure of task parallel programs, which requires associating DAG structure with performance measures at multiple levels of detail. This allows the analyst to zoom in to trouble spots and view exactly where and how each basic block of a program ran on a multi-core machine.

An Huynh, Douglas Thain, Miquel Pericas, and Kenjiro Taura,
DAGViz: A DAG Visualization Tool for Analzying Task Parallel Program Traces, Workshop on Visual Performance Analysis (VPA) at ACM Supercomputing (SC), November, 2015.

Wednesday, September 9, 2015

Virtual Wind Tunnel in IEEE CiSE

Some of our recent work on a system for collaborative engineering design was recently featured in the September issue of IEEE Computing in Science and Engineering focused on "Open Simulation Laboratories" This project was part of a collaboration between faculty in the computer science and civil engineering departments, Open Sourcing the Design of Civil Infrastructure.

CCL grad student Peter Sempolinski led the design and implementation of an online service enabling collaborative design and evaluation of structures, known as the "Virtual Wind Tunnel". This service enables structural designs to be uploaded and shared, then evaluated for performance via the OpenFOAM CFD package. The entire process is similar to that of collaborative code development, where the source (i.e. a building design) is kept in a versioned repository, automated builds (i.e. building simulation) are performed in a consistent and reproducible way, and test results (i.e. simulation metrics) are used to evaluate the initial design. Designs and results can be shared, annotated, and re-used, making it easy for one engineer to build upon the work of another.

The prototype system has been used in a variety of contexts, most notably to demonstrate the feasibility of crowdsourcing design and evaluation work via Amazon Turk.

Peter Sempolinski, Daniel Wei, Ahsan Kareem, and Douglas Thain, Adapting Collaborative Software Development Techniques to Structural Engineering, IEEE Computing in Science and Engineering, 17(51), pages 27-34, September, 2015. DOI: 10.1109/MCSE.2015.88

Monday, September 7, 2015

Three Papers at IEEE Cluster in Chicago

This week, at the IEEE Cluster Computing conference in Chicago, Ben Tovar will present some of our work on automated application monitoring:

Gideon Juve, Benjamin Tovar, Rafael Ferreira da Silva, Dariusz Krol, Douglas Thain, Ewa Deelman, William Allcock, and Miron Livny, Practical Resource Monitoring for Robust High Throughput Computing, Workshop on Monitoring and Analysis for High Performance Computing Systems Plus Applications at IEEE Cluster Computing, September, 2015.

Matthias Wolf will present our work on the Lobster large scale data management system:

Anna Woodard, Matthias Wolf, Charles Mueller, Nil Valls, Ben Tovar, Patrick Donnelly, Peter Ivie, Kenyi Hurtado Anampa, Paul Brenner, Douglas Thain, Kevin Lannon and Michael Hildreth,
Scaling Data Intensive Physics Applications to 10k Cores on Non-Dedicated Clusters with Lobster, IEEE Conference on Cluster Computing, September, 2015.

Olivia Choudhury will present some work on modelling concurrent applications, trading off thread-level parallelism against task-level parallelism at scale:

Olivia Choudhury, Dinesh Rajan, Nicholas Hazekamp, Sandra Gesing, Douglas Thain, and Scott Emrich,
Balancing Thread-level and Task-level Parallelism for Data-Intensive Workloads on Clusters and Clouds,
IEEE Conference on Cluster Computing, September, 2015.

Wednesday, August 19, 2015

CCTools 5.2.0 released

The Cooperative Computing Lab is pleased to announce the release of version 5.2.0 of the Cooperative Computing Tools including Parrot, Chirp, Makeflow, WorkQueue, SAND, All-Pairs, Weaver, and other software.

The software may be downloaded here:

http://www.cse.nd.edu/~ccl/software/download

This minor release considers the following issues from version 5.1.0:

[Chirp] Fix mkdir python binding. (Ben Tovar)
[Chirp] Adds 'ln' for file links. (Nate Kremer-Herman)
[Chirp/Confuga] Kill a job even on failure. (Patrick Donnelly)
[Debug] Fix log rotation with multiple processes. (Patrick Donnelly)
[Makeflow] Better support for Torque and SLURM for XSEDE. (Nick Hazekamp)
[Parrot] Fix bug where cvmfs alien cache access was sequential. (Ben Tovar)
[Parrot] Allow compilation with iRODS 4.1. (Ben Tovar)
[WorkQueue] Improvements to statistics when using foremen. (Ben Tovar)
[WorkQueue] Fix bug related to exporting environment variables. (Ben Tovar)
[WorkQueue] Task sandboxes where not being deleted at workers. (Ben Tovar)

Thanks goes to our contributors:

Patrick Donnelly
Nathaniel Kremer-Herman
Nicholas Hazekamp
Ben Tovar

Please send any feedback to the CCTools discussion mailing list:

http://ccl.cse.nd.edu/software/help/

Enjoy!

Tuesday, August 18, 2015

Recent CCL Grads Take Faculty Positions

Peter Bui is returning to Notre Dame this fall, where he will be a member of the teaching faculty and will be teaching undergraduate core classes like data structures, discrete math, and more. Welcome back, Prof. Bui!

Hoang Bui completed a postdoc position at Rutgers University with Prof. Manish Parashar, and is starting as an assistant professor at Western Illinois University. Congratulations, Prof. Bui!

Friday, August 14, 2015

CMS Analysis on 10K Cores Using Lobster

We have been working closely with the CMS physics group at Notre Dame for the last year to build Lobster, a data analysis system that runs on O(10K) cores to process data produced by the CMS experiment at the LHC. At peak, Lobster at ND delivers capacity equal to that of a dedicated CMS Tier-2 facility!

Existing data analysis systems for CMS generally require that the user be running in a cluster that has been set up just so for the purpose: exactly the right operating system, certain software installed, various user identities present, and so on. This is fine for the various clusters dedicated to the CMS experiment, but it leaves unused the enormous amount of computing power that can be found at university computing centers (like the ND CRC), national computing resources (like XSEDE or the Open Science Grid), and public cloud systems.

Lobster is designed to harness clusters that are not dedicated to CMS. This requires solving two problems:

The required software and data are not available on every node. Instead, Lobster must bring them in at runtime and create the necessary execution system on the fly.
A given machine may only be available for a short interval of time before it is taken away and assigned to another user, so Lobster must be efficient at getting things set up, and handy at dealing with disconnections and failures.

To do this, we build upon a variety of technologies for distributed computing. Lobster uses Work Queue to dispatch tasks to thousands of machines, Parrot with CVMFS to deliver the complex software stack from CERN, XRootD to deliver the LHC data, and Chirp and Hadoop to manage the output data.

Lobster runs effectively on O(10K) cores so far, depending on the CPU/IO ratio of the jobs. These two graphs show the behavior of a production run on top of HTCondor at Notre Dame hitting up to 10K cores over the course of a 48-hour run. The top graph shows the number of tasks running simultaneously, while the bottom shows the number of tasks completed or failed in each 10-minute interval. Note that about two thirds of the way through, there is a big hiccup, due to an external network outages. Lobster accepts the failures and keeps on going.

Lobster has been a team effort between Physics, Computer Science, and the Center for Research Computing: Anna Woodard and Matthias Wolf have taken the lead in developing the core software; Ben Tovar, Patrick Donnelly, and Peter Ivie have improved and debugged Work Queue, Parrot, and Chirp along the way; Charles Mueller, Nil Valls, Kenyi Anampa, and Paul Brenner have all worked to deploy the system at scale in production; Kevin Lannon, Michael Hildreth, and Douglas Thain provide the project leadership.

Anna Woodard, Matthias Wolf, Charles Nicholas Mueller, Ben Tovar, Patrick Donnelly, Kenyi Hurtado Anampa, Paul Brenner, Kevin Lannon, and Michael Hildreth, Exploiting Volatile Opportunistic Computing Resources with Lobster, Computing in High Energy Physics, January, 2015.

Anna Woodard, Matthias Wolf, Charles Mueller, Nil Valls, Ben Tovar, Patrick Donnelly, Peter Ivie, Kenyi Hurtado Anampa, Paul Brenner, Douglas Thain, Kevin Lannon and Michael Hildreth, Scaling Data Intensive Physics Applications to 10k Cores on Non-Dedicated Clusters with Lobster, IEEE Conference on Cluster Computing, September, 2015.

Thursday, July 16, 2015

Haipeng Cai Defends Ph.D.

Haipeng Cai successfully defended his dissertation, "Cost-effective Dependence Analyses for Reliable Software Evolution", which studied methods for efficiently determining the scope of complex software system that is affected by a given change.

Haipeng will be taking a postdoctoral research position at Virginia Tech under the supervision of Prof. Barbara Ryder.

Congratulations to Dr. Haipeng Cai!

CCTools 5.1.0 released

The Cooperative Computing Lab is pleased to announce the release of version 5.1.0 of the Cooperative Computing Tools including Parrot, Chirp, Makeflow, WorkQueue, SAND, All-Pairs, Weaver, and other software.

The software may be downloaded here:
download

This minor release adds a couple of small features, and fixes the following
issues of version 5.0.0:

[Prune] Fix installation issue. (Haiyan Meng)
[Umbrella] Fix installation issue. (Haiyan Meng)
[WorkQueue] Worker's --wall-time to specify maximum period of time a worker may be active. (Andrey Tovchigrechko, Ben Tovar)
[WorkQueue] work_queue_status's --M to show the status of masters by name. (Names may be regular expressions). (Ben Tovar)
[WorkQueue] Fix missing priority python binding.
[WorkQueue] Fix incorrect reset of workers when connecting to different masters. (Ben Tovar)
[WorkQueue] Fix segmentation fault when cloning tasks. (Ben Tovar)
[WQ_Maker] Cleanup, and small fixes. (Nick Hazekamp)

Thanks goes to our contributors:

Nicholas Hazekamp
Haiyan Meng
Ben Tovar
Andrey Tovchigrechko

Please send any feedback to the CCTools discussion mailing list:

mailing list

Enjoy!

~

Tuesday, July 7, 2015

CCTools 5.0.0 released

The Cooperative Computing Lab is pleased to announce the release of version 5.0.0 of the Cooperative Computing Tools including Parrot, Chirp, Makeflow, WorkQueue, SAND, All-Pairs, Weaver, and other software.
The software may be downloaded here: CCTools download
This is a major release that incorporates the preview of three new tools:

[Confuga] An active storage cluster file system built on top of Chirp. It is used as a collaborative distributed file system and as a platform for execution of scientific workflows with full data locality for all job dependencies. (Patrick Donnelly)
[Umbrella] A tool for specifying and materializing comprehensive execution environments. Once a task is specified, Umbrella determines the minimum mechanism necessary to run it such as, direct execution, a system container, a local virtual machine, or submission to a cloud or grid environment. (Haiyan Meng).
[Prune] A system for executing and precisely preserving scientific workflows. Collaborators can verify research results and easily extend them at a granularity determined by the user. (Peter Ivie)

This release adds several features and several bug fixes. Among them:

[AllPairs] Support for symmetric matrices. (Haiyan Meng)
[Chirp] Perl and python bindings. (Ben Tovar)
[Chirp] Improvements to the job interface. (Patrick Donnelly)
[Makeflow] Improved Graphviz's dot output. (Nate Kremer-Herman)
[Makeflow] Support for command wrappers. (Douglas Thain)
[Parrot] Several bug fixes for CVMFS-based applications. (Jakob Blomer, Patrick Donnelly)
[Parrot] Valgrind support. (Patrick Donnelly)
[Resource Monitor] Library for polling resources. (Ben Tovar)
[WorkQueue] Signal handling bug fixes. (Andrey Tovchigrechko)
[WorkQueue] Log visualizer. (Ryan Boccabella)
[WorkQueue] work_queue_worker support for Docker. (Charles Zheng)
[WorkQueue] Improvements to perl bindings. (Ben Tovar)
[WorkQueue] Support to blacklist workers. (Nick Hazekamp)

Incompatibility warnings: Workers from 5.0 do not work with masters pre 5.0.
Thanks goes to the contributors for many features and bug fixes: Matthew Astley, Jakob Blomer, Ryan Boccabella, Peter Bui, Patrick Donnelly, Nathaniel Kremer-Herman, Victor Hawley, Nicholas Hazekamp, Peter Ivie, Kangkang Li, Haiyan Meng, Douglas Thain, Ben Tovar, Andrey Tovchigrechko, and Charles Zheng.
Please send any feedback to the CCTools discussion mailing list: mailing list
Enjoy!

Wednesday, July 1, 2015

Preservation Framework for Computational Reproducibility at ICCS 2015

Haiyan Meng presented our work on Preservation Framework for Computational Reproducibility at the International Conference on Computational Science (ICCS) in Reykjavik, Iceland. This is a collaborative work between University of Notre Dame and University of Chicago for the DASPOS project both of these two universities are working on.

The preservation framework proposed in this paper includes three parts:

First, how to use light-weight application-level virtualization techniques to create a reduced package which only includes all the necessary dependencies;
Second, how to organize the data storage archive to preserve these packages;
Third, how to distribute applications through standard software delivery mechanisms like Docker and deploy applications through flexible deployment mechanisms such as Parrot, PTU, Docker, and chroot.

Haiyan Meng, Rupa Kommineni, Quan Pham, Robert Gardner, Tanu Malik and Douglas Thain,
An Invariant Framework for Conducting Reproducible Computational Science,
Journal of Computational Science 9 (2015): 137-142. DOI: 10.1016/j.jocs.2015.04.012

Friday, June 19, 2015

Umbrella and Containers at VTDC 2015

Two CCL students presented their latest work at the Virtualization Technologies in Distributed Computing (VTDC) at the Symposium on High Performance Distributed Computing (HPDC) in Portland, Oregon.

Haiyan Meng presented her work on Umbrella, a system for specifying and materializing execution environments in a portable and reproducible way. Umbrella accepts a declarative specification for an application, and then determines the minimum technology needed to deploy it. The application will be run natively if the local execution environment is compatible, but if not, Umbrella will deploy a container, a virtual machine, or make use of a public cloud if necessary.

Haiyan Meng and Douglas Thain,
Umbrella: A Portable Environment Creator for Reproducible Computing on Clusters, Clouds, and Grids,
Workshop on Virtualization Technologies in Distributed Computing (VTDC) at HPDC, June, 2015. DOI: 10.1145/2755979.2755982

Charles Zheng presented his work on integrating Docker containers into the Makeflow workflow engine and the Work Queue runtime system, each with different tradeoffs in performance and isolation. These capabilities will be included in the upcoming 5.0 release of CCTools.

Charles Zheng and Douglas Thain,
Integrating Containers into Workflows: A Case Study Using Makeflow, Work Queue, and Docker,
Workshop on Virtualization Technologies in Distributed Computing (VTDC), June, 2015. DOI: 10.1145/2755979.2755984

Wednesday, May 27, 2015

Lobster Talk at Condor Week 2015

Ben Tovar gave an overview of Lobster in the talk High-Energy Physics workloads on 10k non-dedicated opportunistic cores with Lobster. The talk was part of Condor Week 2015, at the University of Wisconsin-Madison.

Lobster is a system for deploying data intensive high-throughput science applications on non-dedicated resources. It is build on top Work Queue, Parrot, and Chirp, which are part of CCTools.

Tuesday, May 19, 2015

Parrot and Lobster at CHEP 2015

CCL students gave two poster presentations at the annual Computing in High Energy Physics (CHEP) conference in Japan. Both represent our close collaboration with the CMS HEP group at Notre Dame:

Haiyan Meng presented A Case Study in Preserving a High Energy Physics Application. This poster describes the complexity of preserving a non-trivial application, the shows how Parrot packaging technology can be used to capture a program's
dependencies, and then re-execute it using a variety of technologies.

Anna Woodard and Matthias Wolf won the best poster presentation award for Exploiting Volatile Opportunistic Computing Resources with Lobster, which was rewarded with a lightning plenary talk. Lobster is an analysis workload management system which has been able to harness 10-20K opportunistic cores at a time for large workloads at Notre Dame, making the facility comparable in size to the dedicated Tier-2 facilities of the WLCG!

Monday, May 4, 2015

Peter Sempolinski Defends Ph.D.

Dr. Peter Sempolinski successfully defended his PhD thesis titled "An Extensible System for Facilitating Collaboration for Structural Engineering Applications"

While at Notre Dame, Peter created a Virtual Wind Tunnel which enabled the crowdsourcing of structural design and evaluation by combining online building design with Google Sketchup and CFD simulation with OpenFoam. The system was used in a variety of contexts, ranging from virtual engineering classes to managing work crowdsourced via Mechanical Turk. his work was recently accepted for publication in IEEE CiSE and PLOS1.

Congratulations to Dr. Sempolinski!

Friday, May 1, 2015

CMS Analysis on 10K Cores with Lobster

The CMS physics group at Notre Dame has created Lobster, a data analysis system that runs on O(10K) cores to process data produced by the CMS experiment at the LHC. Lobster uses Work Queue to dispatch tasks to thousands of machines, Parrot with CVMFS to deliver the complex software stack from CERN, XRootD to deliver the LHC data, and Chirp and Hadoop to manage the output data. By using these technologies, Lobster is able to harness arbitrary machines and bring along the CMS computing environment wherever it goes. At peak, Lobster at ND delivers capacity equal to that of a dedicated CMS Tier-2 facility! (read more here)

Friday, April 10, 2015

Dinesh Rajan Defends Ph.D.

Dr. Dinesh Rajan successfully defended his PhD thesis titled "Principles for the Design and Operating of Elastic Scientific Applications on Distributed Systems" He is currently an engineer at Amazon Web Services.

While at Notre Dame, hd made significant contributions to the development of Work Queue and worked closely with scientists in biology and molecular dynamics to build highly scalable elastic applications such as the Accelerated Weighted Ensemble. His most recent journal paper in IEEE TCC describes how to design self-tuning cloud applications.

Congratulations to Dr. Rajan!

Friday, March 27, 2015

Confuga: Scalable Data Intensive Computing for POSIX Workflows

Patrick Donnely will present his work on the Confuga distributed filesystem at CCGrid 2015 in China:

Patrick Donnelly, Nicholas Hazekamp, Douglas Thain,Confuga: Scalable Data Intensive Computing for POSIX Workflows, IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, May, 2015.

Confuga is a new active storage cluster file system designed for executing regular POSIX workflows. Users may store extremely large datasets on Confuga in a regular file system layout, with whole files replicated across the cluster. You may then operate on your dataset using regular POSIX applications, with defined inputs and outputs.

Confuga handles the details of placing jobs near data and minimizing network load so that the cluster's disk and network resources are used efficiently. Each job executes with all of its input file dependencies local to its execution, within a sandbox.

For those familiar with CCTools, Confuga operates as a cluster of Chirp servers with a single Chirp server operating as the head node. You may use the Chirp library, Chirp CLI toolset, FUSE, or even Parrot to upload and manipulate the data on Confuga.

For running a workflow on Confuga, we encourage you to use Makeflow. Makeflow will submit the jobs to Confuga using the Chirp job protocol and take care of ordering the jobs based on their dependencies.

Tuesday, March 24, 2015

Makeflow Visualization with Cytoscape

We have created a new Makeflow visualization module which exports a workflow into an xgmml file compatible with Cytoscape. Cytoscape is a powerful network graphing application with support for custom styles, layouts, annotations, and more. While this program is known more for visualizing molecular networks in biology, it can be used for any purpose, and we believe it is a powerful tool for visualizing makeflow tasks. Our visualization module was designed for and tested on Cytoscape 3.2. The following picture is a Cytoscape visualization of the example makeflow script provided in the User’s Manual (http://ccl.cse.nd.edu/software/manuals/makeflow.html):

To generate a Cytoscape graph from your makeflow script, simply run:

makeflow_viz –D cytoscape workflow.mf > workflow.xgmml

workflow.xgmml can then be opened in Cytoscape through File -> Import -> Network -> File. We have created a clean style named specifically for visualizing makeflow tasks named style.xml, which is generated in the present working directory when you run makeflow_viz. To apply the style in Cytoscape, select File -> Import -> Style, and select the style.xml file. Next, right-click the imported network and select “Apply Style…”. Select “makeflow” from the dropdown menu and our style will be applied. This will add the proper colors, edges, arrows, and shapes for processes and files.

Cytoscape also has a built in layout function which can be used to automatically rearrange nodes according to their hierarchy. To access this, select Layout à Settings, and a new window will pop up. Simply select “Hierarchical Layout” from the dropdown menu, change the settings for that layout to your liking, and select “Execute Layout.” There is a caveat with this function. With larger makeflow tasks, this auto layout function can take long to complete. This is due to Cytoscape being designed for all types of graphs, and they do not appear to implement algorithms specifically for dags to take advantage of faster time complexities. We have tested the auto-layout function with the following test cases:

Number of nodes	Number of edges	Time to layout nodes
114	258	20-30 seconds
2213	11526	2.5 hours
15245	30478	23 hours

After the layout completes, the graph should be visible in a clean fashion, and you can customize the display further to your liking with the various options available in Cytoscape. For more information about Cytoscape, visit http://cytoscape.org