Session

Weekend Session 6: Advanced Concepts - Research & Academia III

Location

Utah State University, Logan, UT

Abstract

The goal of this project was to create a system capable of autonomous operation with minimal telemetry requirements, while operating within the limits of on-orbit compute and power reserves. Previous work has centered around the use of GPUs to train Deep Reinforcement Learning (DRL) agents for the purpose of autonomous space debris remediation. In the past, a DRL agent was fed orbital tracking data for both debris and active spacecraft to effect target intercepts. However, this approach proved problematic due to large network sizes and expensive computational and training costs. An updated approach utilized Nvidia GPUs to train a Double-Q network with a replay buffer, enabling autonomous orbit transfers and 1km intercepts in a simulated environment. Once within the intercept window, the spacecraft would switch control over to a Convolutional Neural Network (CNN), which relied on direct observational data to identify the target object. This data was supplied via simulated inputs for onboard lidar, infrared, and visible light sensors. Combined with the supplied ground tracking data for the target object, the spacecraft is able to identify the target before capture. While highly effective, the complete reliance on GPUs for inference precluded these solutions from being deployed to edge solutions in orbit due to the relatively high compute cost and cost of telemetry. To mitigate this issue, Deep-Q networks and CNNs were trained using traditional methods before being pruned to reduce both size and compute cost of the networks. To verify that the models had been successfully pruned while still maintaining performance, the models were uploaded to a cubesat model which was interfaced with the simulated environment. The physical cubesat model was configured with the intended operational limitations in mind: power generation and storage, compute power, telemetry capabilities, and sensor packages. The result was an autonomous spacecraft control system that can select the best candidate for a successful intercept, effect an orbit transfer, and capture the target with a relative velocity of less than 1m/s. After successful capture has been confirmed the spacecraft then deorbits the debris.

Share

COinS
 
Aug 7th, 12:15 PM

Utilizing Deep Reinforcement Learning to Effect Autonomous Orbit Tranfers and Intercepts On-Orbit via Edge Compute

Utah State University, Logan, UT

The goal of this project was to create a system capable of autonomous operation with minimal telemetry requirements, while operating within the limits of on-orbit compute and power reserves. Previous work has centered around the use of GPUs to train Deep Reinforcement Learning (DRL) agents for the purpose of autonomous space debris remediation. In the past, a DRL agent was fed orbital tracking data for both debris and active spacecraft to effect target intercepts. However, this approach proved problematic due to large network sizes and expensive computational and training costs. An updated approach utilized Nvidia GPUs to train a Double-Q network with a replay buffer, enabling autonomous orbit transfers and 1km intercepts in a simulated environment. Once within the intercept window, the spacecraft would switch control over to a Convolutional Neural Network (CNN), which relied on direct observational data to identify the target object. This data was supplied via simulated inputs for onboard lidar, infrared, and visible light sensors. Combined with the supplied ground tracking data for the target object, the spacecraft is able to identify the target before capture. While highly effective, the complete reliance on GPUs for inference precluded these solutions from being deployed to edge solutions in orbit due to the relatively high compute cost and cost of telemetry. To mitigate this issue, Deep-Q networks and CNNs were trained using traditional methods before being pruned to reduce both size and compute cost of the networks. To verify that the models had been successfully pruned while still maintaining performance, the models were uploaded to a cubesat model which was interfaced with the simulated environment. The physical cubesat model was configured with the intended operational limitations in mind: power generation and storage, compute power, telemetry capabilities, and sensor packages. The result was an autonomous spacecraft control system that can select the best candidate for a successful intercept, effect an orbit transfer, and capture the target with a relative velocity of less than 1m/s. After successful capture has been confirmed the spacecraft then deorbits the debris.