Session
Technical Poster Session 8: Guidance, Navigation & Control
Location
Utah State University, Logan, UT
Abstract
Autonomy is a key challenge for future space exploration endeavors. Deep Reinforcement Learning holds the promises for developing agents able to learn complex behaviors simply by interacting with their environment. This work investigates the use of Reinforcement Learning for satellite attitude control applied to two working conditions: the nominal case, in which all the actuators (a set of 3 reaction wheels) are working properly, and the underactuated case, where an actuator failure is simulated randomly along one of the axes. In particular, a control policy is implemented and evaluated to maneuver a small satellite from a random starting angle to a given pointing target. In the proposed approach, the control policies are implemented as Neural Networks trained with a custom version of the Proximal Policy Optimization algorithm, and they allow the designer to specify the desired control properties by simply shaping the reward function. The agents learn to effectively perform large-angle slew maneuvers with fast convergence and industry-standard pointing accuracy.
Underactuated Attitude Control with Deep Reinforcement Learning
Utah State University, Logan, UT
Autonomy is a key challenge for future space exploration endeavors. Deep Reinforcement Learning holds the promises for developing agents able to learn complex behaviors simply by interacting with their environment. This work investigates the use of Reinforcement Learning for satellite attitude control applied to two working conditions: the nominal case, in which all the actuators (a set of 3 reaction wheels) are working properly, and the underactuated case, where an actuator failure is simulated randomly along one of the axes. In particular, a control policy is implemented and evaluated to maneuver a small satellite from a random starting angle to a given pointing target. In the proposed approach, the control policies are implemented as Neural Networks trained with a custom version of the Proximal Policy Optimization algorithm, and they allow the designer to specify the desired control properties by simply shaping the reward function. The agents learn to effectively perform large-angle slew maneuvers with fast convergence and industry-standard pointing accuracy.