Session

Weekday Session 4: Automation

Location

Utah State University, Logan, UT

Abstract

In-orbit events and onboard malfunctions, often manifesting as telemetry faults, can compromise satellite missions. Monitoring such faults through ground operations is time intensive and tedious, motivating the need for advanced onboard and autonomous satellite resilience capabilities, which will increase the likelihood of mission success. We present a novel approach for onboard autonomous satellite fault diagnosis that leverages ensemble learning to jointly detect faults and attribute them to a probable cause. The framework consists of an ensemble of representation learners, including an Autoencoder (AE), Kalman Filter (KF), Gaussian Mixture Model (GMM), Long Short Term Memory network (LSTM), and the PCMCI causal discovery algorithm, which extract informative data representations from satellite telemetry data. Then, using a decision tree classifier called XGBoost, we detect and classify faults based on the fused representations. The implementation is modular and can easily integrate into a larger fault response and decision-making system.

In the development of our prototype, we recognized that the scarcity of historical faulting telemetry data and detailed ground truth labels present major limitations to the improvement of autonomous fault diagnosis approaches, both in terms of the ability to train machine learning algorithms and to provide quantitative and comparative validation of methods. To address this limitation, we develop a novel statistical telemetry simulation tool, called SatFaultSim, that generates non-faulting and faulting data for the training and validation of fault detection and attribution algorithms. SatFaultSim can model 11 common fault cases, such as ionizing radiation faults, system resets, thermal and current faults. Each case statistically emulates fault scenarios observed during past satellite missions. The tool generates faults based on user input through configuration files and is extendable to accommodate additional fault cases.

We use SatFaultSim to train and validate our ensemble learning approach to jointly demonstrate the capability of the simulation tool and the effectiveness of our algorithmic techniques for satellite fault diagnosis. We simulate a 10-orbit dataset of faulting and non-faulting telemetry samples, split into training and validation subsets. After training the machine learning models, we perform an in-depth evaluation resulting in highly promising results, with an overall combined fault detection and attribution accuracy of 99.89%, detection accuracy of 99.94%, and average fault attribution accuracy of 99.97%. The false positive and negative rates are very low for each fault type, all falling under 0.013%. These metrics show that our approach is highly capable of identifying and attributing known faults and can serve as a baseline for autonomous fault diagnosis methods. The algorithmic framework and simulation tool are implemented in Python and available on Github. Moving forward, our work will facilitate further improvements to autonomous fault resilience, such as the integration of non-parametric and un-supervised learning techniques to accommodate unseen and rare fault types. Our approach can also enhance an autonomous decision-making framework by informing an onboard fault response or mitigation system.

Available for download on Friday, August 02, 2024

Share

COinS
 
Aug 6th, 1:45 PM

Ensemble Learning for Autonomous Onboard Satellite Fault Diagnosis With Validation Tool

Utah State University, Logan, UT

In-orbit events and onboard malfunctions, often manifesting as telemetry faults, can compromise satellite missions. Monitoring such faults through ground operations is time intensive and tedious, motivating the need for advanced onboard and autonomous satellite resilience capabilities, which will increase the likelihood of mission success. We present a novel approach for onboard autonomous satellite fault diagnosis that leverages ensemble learning to jointly detect faults and attribute them to a probable cause. The framework consists of an ensemble of representation learners, including an Autoencoder (AE), Kalman Filter (KF), Gaussian Mixture Model (GMM), Long Short Term Memory network (LSTM), and the PCMCI causal discovery algorithm, which extract informative data representations from satellite telemetry data. Then, using a decision tree classifier called XGBoost, we detect and classify faults based on the fused representations. The implementation is modular and can easily integrate into a larger fault response and decision-making system.

In the development of our prototype, we recognized that the scarcity of historical faulting telemetry data and detailed ground truth labels present major limitations to the improvement of autonomous fault diagnosis approaches, both in terms of the ability to train machine learning algorithms and to provide quantitative and comparative validation of methods. To address this limitation, we develop a novel statistical telemetry simulation tool, called SatFaultSim, that generates non-faulting and faulting data for the training and validation of fault detection and attribution algorithms. SatFaultSim can model 11 common fault cases, such as ionizing radiation faults, system resets, thermal and current faults. Each case statistically emulates fault scenarios observed during past satellite missions. The tool generates faults based on user input through configuration files and is extendable to accommodate additional fault cases.

We use SatFaultSim to train and validate our ensemble learning approach to jointly demonstrate the capability of the simulation tool and the effectiveness of our algorithmic techniques for satellite fault diagnosis. We simulate a 10-orbit dataset of faulting and non-faulting telemetry samples, split into training and validation subsets. After training the machine learning models, we perform an in-depth evaluation resulting in highly promising results, with an overall combined fault detection and attribution accuracy of 99.89%, detection accuracy of 99.94%, and average fault attribution accuracy of 99.97%. The false positive and negative rates are very low for each fault type, all falling under 0.013%. These metrics show that our approach is highly capable of identifying and attributing known faults and can serve as a baseline for autonomous fault diagnosis methods. The algorithmic framework and simulation tool are implemented in Python and available on Github. Moving forward, our work will facilitate further improvements to autonomous fault resilience, such as the integration of non-parametric and un-supervised learning techniques to accommodate unseen and rare fault types. Our approach can also enhance an autonomous decision-making framework by informing an onboard fault response or mitigation system.