Monday, March 4, 2024

TaskVine at the HEP Analysis Grand Challenge

Barry Sly-Delgado and Ben Tovar recently presented our work on transforming high energy physics data analysis applications into near-interactive execution at the IRIS-HEP Analysis Grand Challenge Demo Day.

By using the TaskVine framework, we have enabled the transformation of a production physics data analysis application (DV3), reducing execution time from over an hour to just over three minutes 
on about four thousand cores of a campus HPC cluster.



Advancements to hardware and software components in an application stack aid in the facilitation of application reshaping. With application reshaping, we are transitioning applications from long running to near-interactive.  While improvements to hardware and low-level software improvements can produce measurable speedup, significant speedup is obtainable via optimization to the scheduling and execution layers of the distributed application.

TaskVine, our latest workflow technology, makes use of in-cluster bandwidth and disk to mitigate data movement by enabling peer transfers between worker nodes within a compute cluster.  TaskVine is currently used to run a variety of custom data analysis workflows written by the CMS physics group at Notre Dame.  These applications are written in high-level Python, making use of standard technologies like Numpy, Coffea, and Dask in order to generate large task graphs.  TaskVine then takes that graph and deploys it into the cluster.
 
Our previous versions of these physics applications utilized our earlier WorkQueue executor.  TaskVine improves upon the earlier system in two distinct ways:

First - While Work Queue only transfers data between the manager and workers, TaskVine transfers data directly between peer workers.  Peer transfers relieve pressure on the manager to distribute data dependencies to worker nodes within the cluster. For workflows that generate large amounts of intermediate data, this can be extremely costly to scheduling performance, because the manager sends and receives large amounts of data. 

This animation shows the data transfer throughout an application when using Work Queue:
While this animation shows the data transfers between all nodes of the cluster when using TaskVine: 
Second - TaskVine extends the notion of task execution into a serverless model.  Python code in the analysis application can be defined once as a LibraryTask which is deployed into the cluster.  Then, lightweight FunctionCall tasks can invoke the code in the LibraryTask without constantly restarting the Python environment.  In this modem the python interpreter is now invoked once per worker.

This graph compares the distribution of execution times between normal tasks and serverless function calls, improving the lower limit of execution time by an order of magnitude.

These four graphs show the overall performance of the data analysis application through four improvement states: first with Work Queue and Hadoop, then with Work Queue and VAST (new HPC filesystem), then TaskVine with peer transfers, and finally TaskVine with serverless execution.  Overall, this transforms a one hour workflow into a 3 minute workflow.

 


No comments:

Post a Comment