RESOURCE CONFIGURATION DECISIONS FOR BULK SYNCHRONOUS PARALLEL JOBS
Wang, Evan
0000-0002-0656-5841
:
2021-08-10
Abstract
In a Bulk Synchronous Parallel model, multiple tasks perform computations concurrently and need to synchronize with each other at specific “supersteps”. For this type of job, the overall duration of the computation depends on the slowest task at each step. In order to reduce this runtime in a containerized cloud environment, we need to adaptively provision resources in a way which prioritizes slower tasks. This thesis extends the EXPPO approach of finding optimal resource configurations for co-simulations, by removing the constraint that task workloads are static. The resulting variation in computation time for each task (even with the same resource provisioning) greatly reduces the effectiveness of a static configuration. This research addresses this challenge with more extensive workload profiling as well as time-series methods for predicting task workloads. These predictions are repeatedly fed into a model predictive controller to plan resource configuration decisions and account for costs of reconfiguration. This research evaluates the performance of multiple different resource configuration planning algorithms using simulated jobs in a Kubernetes environment.