Session
Technical Session VII: 13th Annual Frank J. Redd Student Scholarship Competition
Abstract
We describe a fault-tolerant architecture designed to enhance commercial-off-the-shelf (COTS) device-based space-system reliability, and to provide automated system recovery, in the presence of radiation-induced functional errors. The architecture is primarily aimed at cost-effective “small satellite” systems, where very limited mass, volume and power resources preclude the use of multiple-redundant system-based architectures. Our architecture is based on the concept of a fast data network interlinking all units of the data handling subsystem to an intelligent supervisor node. The supervisor monitors status messages from the units and intervenes when the state of a unit does not match expectations or messages stop arriving. In such an event, the supervisor attempts to identify the nature of the fault and to recover the unit accordingly. Thus, this approach is flexible enough to support the fault-tolerant strategy deemed most suitable for the devices under consideration, given their failure modes and operating environments.
Presentation Slides
System-Level Mitigation of SEFIs in Data Handling Architectures, A Solution for Small Satellites
We describe a fault-tolerant architecture designed to enhance commercial-off-the-shelf (COTS) device-based space-system reliability, and to provide automated system recovery, in the presence of radiation-induced functional errors. The architecture is primarily aimed at cost-effective “small satellite” systems, where very limited mass, volume and power resources preclude the use of multiple-redundant system-based architectures. Our architecture is based on the concept of a fast data network interlinking all units of the data handling subsystem to an intelligent supervisor node. The supervisor monitors status messages from the units and intervenes when the state of a unit does not match expectations or messages stop arriving. In such an event, the supervisor attempts to identify the nature of the fault and to recover the unit accordingly. Thus, this approach is flexible enough to support the fault-tolerant strategy deemed most suitable for the devices under consideration, given their failure modes and operating environments.