Friday, March 27, 2015

Confuga: Scalable Data Intensive Computing for POSIX Workflows

Patrick Donnely will present his work on the Confuga distributed filesystem at  CCGrid 2015 in China:

Patrick Donnelly, Nicholas Hazekamp, Douglas Thain,Confuga: Scalable Data Intensive Computing for POSIX Workflows, IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, May, 2015.  
Confuga is a new active storage cluster file system designed for executing regular POSIX workflows. Users may store extremely large datasets on Confuga in a regular file system layout, with whole files replicated across the cluster. You may then operate on your dataset using regular POSIX applications, with defined inputs and outputs.

Confuga handles the details of placing jobs near data and minimizing network load so that the cluster's disk and network resources are used efficiently. Each job executes with all of its input file dependencies local to its execution, within a sandbox.

For those familiar with CCTools, Confuga operates as a cluster of Chirp servers with a single Chirp server operating as the head node. You may use the Chirp library, Chirp CLI toolset, FUSE, or even Parrot to upload and manipulate the data on Confuga.

For running a workflow on Confuga, we encourage you to use Makeflow. Makeflow will submit the jobs to Confuga using the Chirp job protocol and take care of ordering the jobs based on their dependencies.

No comments:

Post a Comment