Dynamic Safety Assurance of Autonomous Cyber-Physical Systems
Cyber-Physical Systems (CPSs) are ubiquitous through our interactions with applications such as smart homes, medical devices, avionics, and automobiles. However, the ever-increasing complexity, domain interdependence, and dynamic nature of operations have raised safety concerns in using them for safety-critical applications. Typically, safety assurance techniques such as an assurance case are used at design time to design a safety argument supported by evidence intended to demonstrate that the system will satisfy its assurance guarantees. The argument is subject to certain assumptions about the behavior of components and the system's operating environment. However, on deployment, the evolving nature of the system potentially invalidates the design-time assumptions, thereby defeating the safety argument. This problem of safety assurance is exacerbated by using data-driven Learning Enabled Components (e.g., Deep Neural Networks) to design autonomous CPS. Despite having performed well, the closed-world assumptions under which these components are trained often get invalidated when deployed in open-world scenarios (Out-of-Distribution data problem). The invalidation of these design-time assumptions could potentially increase the system's risk of consequence at runtime. Therefore, there is a need to continuously monitor these assumptions at runtime and quantify the risk posed to the system. This is not possible by the existing design-time safety assurance techniques and requires a dynamic safety assurance approach. This research aims to quantify the risk posed to autonomous CPS and select a suitable mitigation strategy for mitigating the risk at runtime. For this, we have designed the dynamic safety assurance framework, which is a synergy of the design-time and runtime assurance approaches. The framework begins with (a) Automated development of an assurance case, which holds the safety arguments for the system and the assumptions under which they are valid. Once these arguments and the assumptions are available, the framework has three components to perform the following activities at runtime. (b) Detect invalidation of the design-time assumptions and identify the factor(s) responsible for the problem. Especially, we are interested in detecting the violations of the closed-world assumption in training the LEC, resulting in the OOD data problem. (c) Assess the system's operational risk at runtime, given the current operating conditions, the known sensor and actuator faults, and the information from the detectors. (d) Mitigate the risk by dynamically selecting a suitable control action for the system. We refer to each of these components as the dynamic assurance components. Finally, we have the data generation component to generate the data required for developing the other components of the framework. We demonstrate the effectiveness of the proposed approach with two different autonomous CPS testbeds. The first platform is a resource-constrained remote-controlled autonomous driving testbed called DeepNNCar. The second platform is an Autonomous Vehicle (AV) case study in the CARLA simulation.