Tuesday, November 20, 2018

Parallel Application Capacity Paper at Supercomputing 2018

Nate Kremer-Herman presented the paper A Lightweight Model for Right-Sizing Master-Worker Applications at the ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis (Supercomputing) on November 14, 2018 in Dallas, Texas. This year marked the 30th anniversary of the Supercomputing conference.

In A Lightweight Model for Right-Sizing Master-Worker Applications, we note that when running a parallel application at scale, a resource provisioning policy should minimize over-commitment (idle resources) and under-commitment (resource contention). However, users seldom know the quantity of resources to appropriately execute their application. Even with such knowledge, over- and under-commitment of resources may still occur because the application does not run in isolation. It shares resources such as network and filesystems.We formally define the capacity of a parallel application as the quantity of resources that may effectively be provisioned for the best execution time in an environment. We present a model to compute an estimate of the capacity of master-worker applications as they run based on execution and data-transfer times.

Although the model for provisioning these applications is important, a key insight from the paper comes from a diagram which demonstrates how a parallel application's scale relates to its total execution time. Let's start with the smallest case first. This graph's x-axis represents the scale of a parallel application (we can assume it is the number of machines utilized for this example). The y-axis represents the total execution time of the application.

Imagine we are domain scientists running some parallel analysis tool. With a scale of 1, our runtime will obviously be the slowest since we are not making use of the parallelism of the application. So we increase our scale to 10. Lo and behold, we see a marked decrease in the total execution time of the application!

So we try scaling up again. We go for broke and leap from a scale of 10 to 500. We notice our execution time is still decreasing! So, let's increase our scale one more time.

At a scale of 1,000 we see the limit to our scalability. Our total execution time has increased from the 500 scale execution. Why? There is a cost to acquiring and maintaining resources. For instance, we might have to start a virtual machine on every computer we use for our application. Starting up a VM takes time.

What we have failed to realize, however, is that we completely missed our optimum scale! The black line of the bottom graph shows the best execution time of this application (which occurs at a scale of 100). This is a key observation from the paper: though it is possible to manually re-run a parallel application with differing scales, it is highly probable we will not find the most appropriate scale to run our application such that our total execution time is minimized (the capacity of our application) unless our search for the optimum scale is exhaustive. This is an unrealistic expectation for most researchers since what matters most is the results of the analysis/simulation/etc. To make the lives of our users easier, we have implemented a lightweight model which does the heavy lifting of finding that appropriate scale for the user.

No comments:

Post a Comment