Session

Technical Poster Session 1

Location

Utah State University, Logan, UT

Abstract

Nearly half of all small satellites launched between 2000 and 2016 have experienced partial or complete failures. Detecting the anomalies and faults responsible for such failures and responding to them rapidly may help increase the success rates of future small satellite missions. However, developing and implementing platform-specific and comprehensive fault management solutions can be cost-prohibitive to most small satellite teams. To enable such teams to achieve this capability quickly and cost-efficiently, we have developed a reusable and fully data-driven framework and associated algorithms to detect and isolate anomalous behaviors. This work is supported by NASA and utilizes correlations between system operational variables and Machine Learning (ML) techniques to generate real-time estimates of expected nominal behavior. Large differences between expected and observed behavior captured through sensor measurements may indicate the presence of anomalies or faults. Since this approach monitors sensor streams for new behavior, a faults database is not required, and anomalies or faults not previously known are also expected to be detected.

To automate this framework, a set of algorithms are developed that use a small amount of normal operational data from a satellite system (or a subsystem) to train the required models. The algorithms utilize physical dependencies between the system’s operational variables that are extracted using a Dynamic Time Warping (DTW) technique and ML regression models. The system’s battery metrics, current and voltage, are considered the roots of trust to diagnose other system operational variables selected based on the DTW correlation strengths. Battery metrics are selected as they can be independently and reliably measured. Fluctuations in a rolling window of battery current and voltage measurements are extracted into features such as window average, window maximum, window minimum, and several others. These features (predictors) and the individual sensors’ readings (targets) are then utilized for training the ML regression models. During a mission, the same battery features are extracted in real-time and fed to the trained ML models to estimate sensors’ measurements expected during nominal system behavior. Then, the cumulative error between predicted and observed measurements and its slope are calculated. An anomaly flag is raised when these two values cross dynamic thresholds computed based on their recent values and some preset weights. Due to the one-to-one nature of the independent mappings from battery metrics to each operational variable, the anomaly is also simultaneously isolated to the sensor itself or the subsystem where it is located. The framework also includes automated testing of the trained ML models and anomaly detection parameters selected by artificially injecting different types of anomalies. The injected anomalies relate to loose connections, abrupt sensor failure, sensor drift, data corruption, and others. In this work, the implementation of this framework on datasets generated from laboratory tests on a CubeSat platform is discussed. Results show nearly 90% average detection rate and less than 1% average false positives rates for many analog operational variables strongly correlated to battery metrics.

SSC23-P1-23.pdf (635 kB)
SSC23-P1-23 Poster

Share

COinS
 
Aug 8th, 9:45 AM

A Reusable Framework for Fault Detection and Isolation in Small Satellites

Utah State University, Logan, UT

Nearly half of all small satellites launched between 2000 and 2016 have experienced partial or complete failures. Detecting the anomalies and faults responsible for such failures and responding to them rapidly may help increase the success rates of future small satellite missions. However, developing and implementing platform-specific and comprehensive fault management solutions can be cost-prohibitive to most small satellite teams. To enable such teams to achieve this capability quickly and cost-efficiently, we have developed a reusable and fully data-driven framework and associated algorithms to detect and isolate anomalous behaviors. This work is supported by NASA and utilizes correlations between system operational variables and Machine Learning (ML) techniques to generate real-time estimates of expected nominal behavior. Large differences between expected and observed behavior captured through sensor measurements may indicate the presence of anomalies or faults. Since this approach monitors sensor streams for new behavior, a faults database is not required, and anomalies or faults not previously known are also expected to be detected.

To automate this framework, a set of algorithms are developed that use a small amount of normal operational data from a satellite system (or a subsystem) to train the required models. The algorithms utilize physical dependencies between the system’s operational variables that are extracted using a Dynamic Time Warping (DTW) technique and ML regression models. The system’s battery metrics, current and voltage, are considered the roots of trust to diagnose other system operational variables selected based on the DTW correlation strengths. Battery metrics are selected as they can be independently and reliably measured. Fluctuations in a rolling window of battery current and voltage measurements are extracted into features such as window average, window maximum, window minimum, and several others. These features (predictors) and the individual sensors’ readings (targets) are then utilized for training the ML regression models. During a mission, the same battery features are extracted in real-time and fed to the trained ML models to estimate sensors’ measurements expected during nominal system behavior. Then, the cumulative error between predicted and observed measurements and its slope are calculated. An anomaly flag is raised when these two values cross dynamic thresholds computed based on their recent values and some preset weights. Due to the one-to-one nature of the independent mappings from battery metrics to each operational variable, the anomaly is also simultaneously isolated to the sensor itself or the subsystem where it is located. The framework also includes automated testing of the trained ML models and anomaly detection parameters selected by artificially injecting different types of anomalies. The injected anomalies relate to loose connections, abrupt sensor failure, sensor drift, data corruption, and others. In this work, the implementation of this framework on datasets generated from laboratory tests on a CubeSat platform is discussed. Results show nearly 90% average detection rate and less than 1% average false positives rates for many analog operational variables strongly correlated to battery metrics.