Session

Weekday Session 2: Missions at Scale

Location

Utah State University, Logan, UT

Abstract

Starlink operates the world's largest constellation of over 4000 satellites, all of which receive regular software updates to deliver new capabilities, improve reliability and performance, and maintain security. Updating the software across the constellation requires solving two core challenges: safely updating an individual satellite in the harsh environment of space, and orchestrating thousands of updates without impacting users of the system.

Variations on these problems have been addressed for large scale terrestrial compute systems by the broader software industry, and we have leveraged practices of that state of the art to develop a novel spacecraft software update system that delivers updates to the entire fleet of spacecraft on a rapid cadence.

We developed a fault tolerant update system that is resilient to a breadth of failure classes, ensures consistency across a satellite composed of many independent computers, and is autonomous and self-correcting across a variety of traditionally challenging space operations scenarios. We leverage that system to adopt software industry standard practices like canary testing and progressive rollout.

We have used this software update system to make over 200 updates to continuously deliver new functionality and improve performance of the fleet with no satellites lost due to failed software update.

Share

COinS
 
Aug 7th, 4:30 PM

Over-The-Vacuum Update – Starlink’s Approach for Reliably Upgrading Software on Thousands of Satellites

Utah State University, Logan, UT

Starlink operates the world's largest constellation of over 4000 satellites, all of which receive regular software updates to deliver new capabilities, improve reliability and performance, and maintain security. Updating the software across the constellation requires solving two core challenges: safely updating an individual satellite in the harsh environment of space, and orchestrating thousands of updates without impacting users of the system.

Variations on these problems have been addressed for large scale terrestrial compute systems by the broader software industry, and we have leveraged practices of that state of the art to develop a novel spacecraft software update system that delivers updates to the entire fleet of spacecraft on a rapid cadence.

We developed a fault tolerant update system that is resilient to a breadth of failure classes, ensures consistency across a satellite composed of many independent computers, and is autonomous and self-correcting across a variety of traditionally challenging space operations scenarios. We leverage that system to adopt software industry standard practices like canary testing and progressive rollout.

We have used this software update system to make over 200 updates to continuously deliver new functionality and improve performance of the fleet with no satellites lost due to failed software update.