Congratulations to Dr. Donnelly!
The
continued exponential growth of storage capacity has catalyzed the
broad acquisition of scientific data which must be processed. While
today's large data analysis systems are highly effective at establishing
data locality and eliminating inter-dependencies, they are not so
easily incorporated into scientific workflows that are often complex and
irregular graphs of sequential programs with multiple dependencies. To
address the needs of scientific computing, I propose the design of an
active storage cluster file system which allows for execution of regular
unmodified applications with full data
locality.
This dissertation analyzes the potential benefits of exploiting the structural information already available in scientific workflows -- the explicit dependencies -- to achieve a scalable and stable system. I begin with an outline of the design of the Confuga active storage cluster file system and its applicability to scientific computing. The remainder of the dissertation examines the techniques used to achieve a scalable and stable system. First, file system access by jobs is scoped to explicitly defined dependencies resolved at job dispatch. Second, workflow's structural information is harnessed to direct and control necessary file transfers to enforce cluster stability and maintain performance. Third, control of transfers is selectively relaxed to improve performance by limiting any negative effects of centralized transfer management.
This work benefits users by providing a complete batch execution platform joined with a cluster file system. The user does not need to redesign their workflow or provide additional consideration to the management of data dependencies. System stability and performance is managed by the cluster file
system while providing jobs with complete data locality. - See more at: http://cse.nd.edu/events/phd-defense-patrick-donnelly#sthash.4BLYgJYM.dpuf
locality.
This dissertation analyzes the potential benefits of exploiting the structural information already available in scientific workflows -- the explicit dependencies -- to achieve a scalable and stable system. I begin with an outline of the design of the Confuga active storage cluster file system and its applicability to scientific computing. The remainder of the dissertation examines the techniques used to achieve a scalable and stable system. First, file system access by jobs is scoped to explicitly defined dependencies resolved at job dispatch. Second, workflow's structural information is harnessed to direct and control necessary file transfers to enforce cluster stability and maintain performance. Third, control of transfers is selectively relaxed to improve performance by limiting any negative effects of centralized transfer management.
This work benefits users by providing a complete batch execution platform joined with a cluster file system. The user does not need to redesign their workflow or provide additional consideration to the management of data dependencies. System stability and performance is managed by the cluster file
system while providing jobs with complete data locality. - See more at: http://cse.nd.edu/events/phd-defense-patrick-donnelly#sthash.4BLYgJYM.dpuf
DATA LOCALITY TECHNIQUES IN AN ACTIVE CLUSTER FILE SYSTEM DESIGNED FOR
SCIENTIFIC WORKFLOWS - See more at: http://cse.nd.edu/events/phd-defense-patrick-donnelly#sthash.4BLYgJYM.dpuf
SCIENTIFIC WORKFLOWS - See more at: http://cse.nd.edu/events/phd-defense-patrick-donnelly#sthash.4BLYgJYM.dpuf
DATA LOCALITY TECHNIQUES IN AN ACTIVE CLUSTER FILE SYSTEM DESIGNED FOR
SCIENTIFIC WORKFLOWS - See more at: http://cse.nd.edu/events/phd-defense-patrick-donnelly#sthash.4BLYgJYM.dpuf
SCIENTIFIC WORKFLOWS - See more at: http://cse.nd.edu/events/phd-defense-patrick-donnelly#sthash.4BLYgJYM.dpuf