Component-based Fault Tolerance for Distributed Real-Time and Embedded Systems
Wolf, Friedhelm
:
2009-04-20
Abstract
Component middleware has become increasingly important in
distributed real-time and embedded (DRE) systems. DRE systems
are characterized by resource constraints and stringent quality
of service (QoS) requirements. Growing demands on system
dependability in turn increases the importance of fault-tolerance
as a QoS aspect.
Research on fault-tolerance in DRE systems has focused mainly on
replication and recovery on the granularity level of single
distributed objects and processes. Component middleware provides
higher-level abstractions, such as a container infrastructure,
means to assemble components to larger units of functionality, and
standardized deployment mechanisms. These mechanisms provide new
opportunities to standardize fault-tolerance, but also pose new
challenges, such as efficient synchronization of internal component
state, failure correlation across groups of components and
configuration of fault-tolerance properties per component.
This thesis makes three contributions to the research on
component-based fault-tolerance. First, we present Components with
HEterogeneous State Synchronization (CHESS), which is a mechanism for
component state replication that enables the flexible use of the
most appropriate communication mechanism. Second, we present
COmponent Replication based on Failover Units (CORFU) that provides
fail-stop behavior and fault correlation across groups of
components. Third, we present an evaluation of the proposed
solutions in comparison to existing object fault-tolerance methods.
These results show that DRE systems based on component middleware
ease the burden of application development by providing middleware
support for fault-tolerance on the level of components. The results
also quantify the performance trade-off compared to object level
fault-tolerance and show that it is acceptable for many DRE systems.