Control and Validation Mechanisms for Information, Resources, and Deployments in Distributed Real-time and Embedded Systems
Edmondson, James Raymon
In the past decade, distributed real-time and embedded (DRE) systems have emerged as an interesting and challenging class of target applications within the mission critical software domain. Examples of existing DRE systems include air traffic management systems, cyber-physical systems composed of sensors and actuators in cooperating teams of robots, industrial control systems, lunar rovers, flight control computers, and total ship computing environments. Each of these systems has its own set of special challenges, but most are characterized by stringent quality-of-service requirements despite scarce resource availability, connectivity, and control. With the push toward multi-core and cloud computing initiatives, deployment and testing of distributed, real-time and embedded (DRE) systems has become even more complex. If DRE system developers are to harness highly parallel, anonymous systems for DRE deployment and testing, more tools and techniques are necessary for control of deployment and testing information, resources, and quality-of-service. Current deployment systems for DRE are highly specialized toward one shot deployments of DRE systems as specified at design time. Consequently, because of the ever-changing nature of mission critical deployments and hardware, static design-time deployments may not reflect the real nature of the targeted testbed, or even worse, the real-world deployment environment as it regards to latency and resource availability in the network. Moreover, the deployment solutions do not provide developers with the right set of tools for sequencing and automating tests for validating the DRE system before it goes live. Additionally, once a deployment is live, conditions in the network may deteriorate until the DRE system is ineffective. We believe better solutions are possible that allow for DRE systems to self-heal and redeploy themselves according to high-level workflows provided by users. This dissertation presents a set of research contributions to the area of deployment and testing frameworks for DRE systems and results from deployment and testing research and tools developed under the QED and AMMO projects at ISIS. First, a reasoning engine for distributed, real time systems is presented for monitoring and mutating DRE system environment information with microsecond latencies on evaluation and dissemination. This reasoning engine is used extensively in our deployment and testing solutions to provide fast, fine-grained control over decision-making processes during testing and deployment. Second, we detail a quality-of-service-enabled distributed mutual exclusion algorithm which allows DRE system developers to decrease latency of resource acquisition for important applications or components in a deployment or testing scenario. Next, we describe research into a deployment and testing framework that harnesses the reasoning engine for sequencing and coordinating activities between processes in the DRE system. Lastly, we describe the development of heuristic-based approaches for approximating the optimal subgraph isomorphic problem, an NP complete problem that is relevant to adaptive redeployment of networked systems. We also outline a networking infrastructure for latency collection, aggregation, summation, and voting schemes required for redeploying enterprise DRE systems according to user-provided dataflows.