Recent CCL graduate Charles Zheng, Ph.D., presented his paper "Autoscaling High Throughput Workloads on Container Orchestrators" at the CLUSTER 2020 conference in September 2020.
In this paper, we explore the problem of how many machines to acquire for a high-throughput workload of known size when running on a container orchestrator like Kubernetes.
Most approaches to autoscaling are designed to scaling up web servers, or other services that respond to some unknown external request. Generally, the autoscaler looks at some metric such as CPU utilization, and scales resources up or down in order to achieve some target like 90% CPU utilization.
However, when running a high throughput workload of, say, one thousand simulation runs, the situation is different. First off, high CPU utilization is the norm: the simulator is likely to peg the CPU at 100% utilization, and adding or removing nodes isn't going to affect simulation. And second, the offered load is not a mystery: we are in control of the workload, so we have some idea of the total size of the workload, or at least the number of jobs currently in the queue.
To address this, Charles built a High Throughput Autoscaler (HTA) that interfaces the Makeflow workflow system with the Kubernetes container orchestrator:
To learn more, check out the paper and accompanying video:
Chao Zheng, Nathaniel Kremer-Herman, Tim Shaffer, and Douglas Thain, Autoscaling High Throughput Workloads on Container Orchestrators, IEEE Conference on Cluster Computing, pages 1-10, September, 2020.