National Aeronautics and Space Administration (NASA)
Goddard Space Flight Center - Greenbelt, MD
~~~ Quick Description ~~~
Led research direction for machine learning components for Research in A.I. for Spacecraft Resilience (RAISR) project.
Developed Long Short-Term Memory (LSTM) network in Python for processing satellite telemetry and onboard diagnosis of spacecraft faults (2020).
Created importance sampling algorithm with ‘explainable AI’ approach by ranking telemetry data based on weights in trained LSTM’s output gate (2020).
Developed POMDP-based agent, solved with PPO, to detect and diagnose errors on satellites (2021).
Worked with NASA engineers from Goddard Space Flight Center and Wallops Flight Facility.
~~~ In-Depth Description ~~~
NASA's SmallSats (a type of small satellite) currently use a naive threshold algorithm to determine when there’s an error. For example, an error will be thrown if the thrusters are pushing above some constant limit, or a sensor is giving unusual data.
When there is an error, the satellites turn off all non-essential equipment (thus aborting whatever experiment they were doing) and focus on staying in orbit until someone in ground control fixes the error.
Normally this is great. For example, if a solar panel falls off the satellite, then that’s catastrophic, and worth going into safe mode over. However, sometimes errors do not require that big of a reaction. For example, if radiation flipped a few bits, or the thrusters are a little higher than the threshold but it’s part of the experiment, then safe mode is unnecessary. We call these “benign errors.”
A benign error is when the error threshold algorithm says there’s an error, but it’s actually not worth aborting the mission and losing all our science experiments data over. Goddard, being an especially science-focused center, has particular interest in protecting this data from being dropped unnecessarily.
My job in Summer 2020, which I returned to again in Summer 2021, was developing for the RAISR project, which focuses on using artificial intelligence & machine learning to autonomously diagnose problems on SmallSats. Whenever an error happens, the RAISR algorithm must determine not only whether the error is benign or catastrophic, but also be able to explain to humans why it made that decision.
For Summer 2020, while my supervisor (Evana Gizzi) and another intern worked on the general A.I. side of the project, I was given full reign of the machine learning side.
First, I developed an LSTM Network which learned to determine the error state of a SmallSat (nominal, benign, or catastrophic) given sequential, timestamped telemetry data from the satellite.
I then created a novel importance sampling algorithm, which opens the "brain" of the trained LSTM and determines which channels of the telemetry data are considered more important than others by the network. This was to aid in creating a story/explainable diagnosis behind the LSTM's decisions. As the importance sampling algorithm was novel, I also tested it extensively with a variety of datasets to empirically prove that it worked.
I presented "Using Machine Learning to Diagnose On-Board Faults in Satellites" at the NASA 2020 virtual intern poster session.
For Summer 2021, I returned to the project. As the team size had doubled this year, I spent some time training the brand new interns. I then returned to the machine learning side of the project, and with another intern, developed a POMDP-based approach to the RAISR problem. This approach was used to create a reinforcement learning agent, trained using PPO, to detect errors in an explainable way. We also implemented a Kalman Filter, which was used as one of the system's tools for error detection.
If you want to read more, the RAISR project was featured in "Cutting Edge" (pages 8-9). You can find a link to that here. Publication and open-sourcing regarding this work is ongoing.