Wednesday, October 27, 2010

From Database to Filesystem and Back Again

Hoang Bui is leading the development of ROARS: a Rich Object Archival System, which is our generalization many of the ideas expressed in the Biometrics Research Grid. Hoang presented a paper on ROARS at the workshop on Data Intensive Distributed Computing earlier this year.

What makes ROARS particularly interesting is that it combines elements of both relational databases and file systems, and makes it possible to swap back and forth between both representations of the data.

A ROARS repository is an unordered collection of items. Each item consists of a binary file and metadata that describes the file. The metadata does not have a schema; you can attach whatever properties you like to an object. Here is an example item consisting of a iris image with five properties:

fileid = 356
subjectid = "S123"
color = "Blue"
camera = "Likon"
date = "23-Oct-2010"
type = "jpeg"



If you like to think in SQL, then you can query the system via SQL and you get back tabular data, as you might expect:

SELECT fileid, subjectid, color FROM irises WHERE color='Blue';

Of course, if you are going to actually process the files in some way, you need to put them into a filesystem where your scripts and tools can access them. For this, you use the EXPORT command, which will produce the files. EXPORT has a neat bit of syntax in which you can specify that the name of each file is generated from the metadata. For example this command:

EXPORT irises WHERE camera='Likon' AS color/subjectid.type

will dump out all of the matching files, put them into directories according to color, and name each file according to the subject and the file type. The example above would be named "Blue/S123.jpeg". (If the naming scheme doesn't result in unique filenames, then you need to adjust it to include something that is unique like fileid.)

Of course, if you are going to process a huge amount of data, then you don't actually want to copy all of it out to your local filesystem. Instead what you can do is create a "filesystem view" which is a directory tree containing pointers back to the objects in the repository. That has a very similar syntax:

VIEW irises WHERE camera='Likon' AS color/subjectid.type

Creating a filesystem view is much faster than exporting the actual data. Now, you can run your programs or scripts to iterate over the files. As they open up each file, the repository is accessed directly to open and read the necessary file data. (This is accomplished transparently by using Parrot to connect to the repository.)

The end result: a database that looks like a filesystem!

Monday, October 18, 2010

Summer REU: Toward Elastic Scientific Applications

In recent months, we have been working on the problem of building elastic parallel applications that can adapt to the available resources at run-time. Much has been written about elastic internet services, but scientific applications have a ways to catch up.

Traditional parallel applications are rigid: the user chooses how many nodes (or cores or CPUs) to use when the program starts. If more resources become available, or the application needs to grow, it is stuck. Even worse, if a node is lost due to a failure or a scheduling change, the program must be aborted. Rigid parallelism has been used for many years in dedicated clusters and supercomputers in the form of libraries such as MPI. It works fine for systems of tens or hundreds of nodes, but if you try to go bigger, it gets harder and harder to find a fully reliable system.

In contrast, an elastic parallel application can be modified at run-time to use greater or fewer resources as they become available, or if the size of the problem changes. Typically, an elastic application has one central coordinating node that tracks the progress of the program, and dispatches work units to other nodes. If new nodes are added to the system, the coordinator gives it some work to do. If a node fails or is removed, the coordinator makes a note of this, and sends the work to another node.

If you have an elastic application, then it becomes much easier to harness large scale computing systems such as clouds and grids. In fact, it becomes easier to harness any kind of computer, because you don't have to worry about them being reliable or even particularly fast. It's also useful in a traditional computing center, because you don't have to sit idle waiting for your ideal number of nodes to become free -- you can start work with whatever is available now.

The only problem is, most existing applications are rigidly parallel. Is it feasible to convert them into elastic applications?

We hosted two REU students to address this question: Anthony Canino, from SUNY-Binghamtom, and Zachary Musgrave, from Clemson University. Each took an existing rigid application and converted it into an elastic parallel application using our Work Queue framework. Work Queue has a simple C API, and makes use of a universal Worker executable that can be submitted to multiple remote systems. A Work Queue application looks like this:
Anthony worked on the problem of replica exchange, which is a technique for running molecular simulations in parallel at different energy levels, in order to achieve a more rapid exploration of the energy landscape. Our friends in the Laboratory for Computational Life Sciences down the hall have developed a molecular dynamics engine known as Protomol, and then implemented replica exchange using MPI. Anthony put Protomol and Work Queue together to create an implementation of replica exchange that can run on an arbitrary number of processors, and demonstrated it running on Condor and SGE simultaneously hundreds of nodes. What's even better is that the computation kernel was simply the sequential version of Protomol, so we avoided all of the software engineering headaches that would come with changing the base software.

Zachary worked with the genome annotation tool Maker, which is used to do things like finding protein sequences within an existing genome. Maker was already parallelized using Perl-MPI, so this required Zach to do some reverse engineering to get at the basic algorithm. However, it became clear that the MPI aspect was simply doling out work units to each node, with the additional optimization of work stealing. Zach added a Perl interface to Work Queue, and converted Maker into an elastic application running on hundreds of nodes. We are currently integrating Maker into Biocompute, our local bioinformatics portal.
Speaking of Biocompute, Notre Dame student Brian Kachmarck did a nice job this summer of re-working the user interface to the web site. Not only is it faster and more visually appealing, it also does a better job of presenting the Data-Action-Queue concept described in our recent paper about the system.