Resource-Aware Deployment, Configuration, and Adaptation for Fault-tolerant Distributed Real-time Embedded Systems
There is growing demand for simultaneous support of timeliness and high availability quality of service (QoS) properties in distributed real-time and embedded (DRE) systems deployed in resource-constrained environments. Addressing this challenge is hard for several reasons, including the increasing complexity and scale of DRE systems, their unpredictable and failure-prone deployment environments, the conflicting demands imposed by timeliness and high availability QoS criteria on available resources, the need for sophisticated load-aware runtime mechanisms and algorithms to respond to changing system workloads and resource availabilities, and the need to share available system resources among multiple applications to satisfy their QoS requirements in a resource-efficient fashion. To address these deployment, configuration, and adaptation needs, this dissertation provides a novel and holistic, middleware-based solution that makes the following contributions to advance the state-of-the-art in real-time and fault-tolerant middleware for DRE systems. • It describes and validates DeCoRAM, which allocates resources efficiently for applications and replicas, and creates a deployment-time solution that is real-time aware, failure-aware, and resource-aware. • It describes and validates NetQoPE, which provides a model-driven resource provisioning engine that handles the configuration complexity in large-scale DRE systems, circumvents the need for application developers to write invasive, application-specific code to obtain a real-time fault-tolerance solution, and automates the deployment and configuration of DRE systems. • It describes and validates FLARe, which is an adaptive load-aware middleware that responds to changing system loads and resource availabilities, selects failure recovery targets based on current per-processor resource availabilities, and dynamically enforces CPU utilization bounds to maintain desired server delays in face of concurrent failures and load changes. This dissertation also describes SwapCIAO, which is a QoS-enabled component middleware framework designed to handle overloads by updating component implementations that are optimized for particular run-time characteristics. Extensive empirical validation is provided to demonstrate the feasibility of these middleware capabilities. All capabilities are demonstrated in the context of several use cases based on requirements in production DRE systems.