Wednesday, April 21, 2021

Lightweight Function Paper at IPDPS

Tim Shaffer, a Ph.D student in the CCL, will be presenting a paper "Lightweight Function Monitors for Fine-Grained Management in Large Scale Python Applications" at the International Parallel and Distributed Processing Symposium (IPDPS) in May 2021.  This work is the result of a collaboration between our group and the Parsl workflow team at the University of Chicago, led by Kyle Chard.

Emerging large scale science applications may consist of a large number of dynamic tasks to be run across a large number of workers.  When written in a Python-oriented framework like Parsl or FuncX those tasks are not heavyweight Unix processes, but rather lightweight invocations of individual functions.  Running these functions at large scale presents two distinct challenges:


1 - The precise software dependencies needed by the function must be made available at each worker node.  These dependencies must be chosen accurately: too few, and the function won't work; too many, and the cost of distribution is too high.  We show a method for determining, distributing, and caching the exact dependencies needed at runtime, without user intervention.

2 - The right number of functions must be "packed" into large worker nodes that may have hundreds of cores and many GB of memory.  Too few, and the system is badly underutilized; too many, and performance will suffer or invocations will crash.  We show an automatic method for monitoring and predicting the resources consumed by categories of functions.  This results in resource allocation that is much more efficient than an unmanaged approach, and is very close to an "oracle" that predicts perfectly.


The techniques shown in this paper are integrated into the Parsl workflow system from U-Chicago, and the Work Queue distributed execution framework from Notre Dame, both of which are open source software supported by the NSF CSSI program.

Citation:
  • Tim Shaffer, Zhuozhao Li, Ben Tovar, Yadu Babuji, TJ Dasso, Zoe Surma, Kyle Chard, Ian Foster, and Douglas Thain, Lightweight Function Monitors for Fine-Grained Management in Large Scale Python Applications, IEEE International Parallel & Distributed Processing Symposium, May, 2021. 

Ph.D. Defense - Nathaniel Kremer-Herman

Congratulations to Dr. Kremer-Herman, who successfully defended his Ph.D. dissertation "Log Discovery, Log Custody, and the Web Inspired Approach for Open Distributed Systems Troubleshooting".  His work created a system TLQ (Troubleshooting via Log Queries) that enables the structured query of distributed data logged by independent components including workflows, batch systems, and application file access.  Prof. Kremer-Herman recently began a faculty position at Hanover College in Indiana.  Congrads!