Thursday, October 4, 2012

NSF Grant: Data and Software Preservation for Open Science


Mike Hildreth, Professor of Physics, Jarek Nabrzyski, Director of the Center for Research Computing and Concurrent Associate Professor of Computer Science and Engineering, and Douglas Thain, Associate Professor of Computer Science and Engineering, are the lead investigators on a project that will explore solutions to the problems of preserving data, analysis software, and how these relate to results obtained from the analysis of large datasets.


Known as Data and Software Preservation for Open Science (DASPOS), it is focused on High Energy Physics data from the Large Hadron Collider (LHC) and the Fermilab Tevatron. The group will also survey and incorporate the preservation needs of other communities, such as Astrophysics and Bioinformatics, where large datasets and the derived results are becoming the core of emerging science in these disciplines


The three-year $1.8M program, funded by the National Science Foundation, will include several international workshops and the design of a prototype data and software-preservation architecture that meets the functionality needed by the scientific disciplines. What is learned from building this prototype will inform the design and construction of the global data and software-preservation infrastructure for the LHC, and potentially for other disciplines.


The multi-disciplinary DASPOS team includes particle physicists, computer scientists, and digital librarians from Notre Dame, the University of Chicago, the University of Illinois Urbana-Champaign, the University of Nebraska at Lincoln, New York University, and the University of Washington, Seattle.

Wednesday, October 3, 2012

Tutorial on Scalable Programming at Notre Dame


Tutorial: Introduction to Scalable Programming with Makeflow and Work Queue

October 24th, 3-5PM, 303 Cushing Hall


Register here (no fee) to reserve your spot in the class:

http://www.nd.edu/~ccl/software/tutorials/ndtut12


Would you like to learn how to write programs that can scale up to
hundreds or thousands of machines?


This tutorial will provide an introduction to writing scalable
programs using Makeflow and Work Queue. These tools are used at Notre
Dame and around the world to attack large problems in fields such as
biology, chemistry, data mining, economics, physics, and more. Using
these tools, you will be able to write programs that can scale up to
hundreds or thousands of machines drawn from clusters, clouds, and
grids.


This tutorial is appropriate for new graduate students, undergraduate
researchers, and research staff involved in computing in any
department on campus. Some familiarity with Unix and the ability to
program in Python, Perl, or C is required.


The class will consist of half lecture and half hands-on instruction
in a computer equipped classroom. The instructors are Dinesh Rajan
and Michael Albrecht, developers of the software who are PhD students
in the CSE department.


For questions about the tutorial, contact Dinesh Rajan, dpandiar AT nd.edu.

Monday, October 1, 2012

Global Access to High Energy Physics Software with Parrot and CVMFS

Scientists searching for the Higgs boson have profited from Parrot's new support for the CernVM Filesystem (CVMFS), a network filesystem tailored to providing world-wide access to software installations. By using Parrot, CVMFS, and additional components integrated by the Any Data, Anytime, Anywhere project, physicists working in the Compact Muon Solenoid experiment have been able to create a uniform computing environment across the Open Science Grid. Instead of maintaining large software installations at each participating institution, Parrot is used to provide access to a single highly-available CVMFS installation of the software from which files are downloaded as needed and aggressively cached for efficiency. A pilot project at the University of Wisconsin has demonstrated the feasibility of this approach by exporting excess compute jobs to run in the Open Science Grid, opportunistically harnessing 370,000 CPU-hours across 15 sites with seamless access to 400 gigabytes of software in the Wisconsin CVMFS repository.
  
- Dan Bradley, University of Wisconsin and the Open Science Grid