Reinforcement learning has proven itself to be a powerful technique in robotics, however it has rarely been employed to learn in a hardware-in-the-loop environment due to the fact that spurious training data could cause a robot to take an unsafe (and potentially catastrophic) action. We will present a method for overcoming this limitation known as Guaranteed Safe Online Learning via Reachability (GSOLR), in which the control outputs from the reinforcement learning algorithm are wrapped inside another controller based on reachability analysis that seeks to guarantee safety against worst-case disturbances.
After defining the relevant backwards reachability constructs and explaining how they can be calculated, we will formalize the concept of GSOLR and show how it can be used on a real-world target tracking problem, in which an observing quadrotor helicopter must keep a target ground vehicle with unknown (but bounded) dynamics inside its field of view at all times, while simultaneously attempting to build a motion model of the target. Extensions to GSOLR will then be presented, which allow the safety of the system to automatically become neither too liberal nor too conservative, thus allowing the machine learning algorithm running in parallel the widest possible latitude while still guaranteeing system safety. These extensions will be demonstrated on the task of safely learning an altitude controller for a quadrotor helicopter. These examples demonstrate the GSOLR framework’s robustness to errors in machine learning algorithms, and indicate its potential for allowing high-performance machine learning systems to be used in safety-critical situations in the future.