Deep Reinforcement Learning (DRL) has great promise for learning behaviours flexibly, but can be hard to reproduce and require thousands of trials, which limits its practical use for robots. At McGill's Mobile Robotics Lab, we have recently:
- Learned to swim with flippers in less than a dozen trials
- Reported reproducibility issues that changed the community's empirical practices
- Developed TD3, continuous state/action DRL at world-leading performance in a dozen lines of training code
- Explored coral reefs in the turbulent littoral ocean autonomously via imitation and self-supervised learning
In this talk, I will describe the statistics and optimization insights gained from these projects. TD3 resulted from our discovery that deep actor-critic methods suffer from an overestimation bias in learning action-values that results from taking the gradient of a noisy estimator. For imitation learning, a similar analysis has identified extrapolation errors as a limiting factor in outperforming noisy experts and the Batch-Constrained Q-Learning (BCQ) approach which can do so. For model-based RL methods using Bayesian neural networks, we have analyzed sampling variance over time and increased the stability of sampling possible futures for data-efficient policy improvement. Finally, I'll give some views on a more symbiotic relationship between robotics and DRL in the future.
David Meger is an Assistant Professor in Computer Science at McGill University. He is the Co-Director of the Mobile Robotics Laboratory, a member of the Centre for Intelligent Machines, co-PI in the NSERC Canadian Robotics Network and an Associate Member of Mila, the Quebec AI Institute. David's PhD research at the University of British Columbia led to Curious George, a robot that won several international contests in live object search. During his postdoctoral research at McGill, he pioneered the use of RL in underwater control, leading to a best paper nomination at ICRA. The research of his current group spans 3D computer vision, visual navigation, imitation learning, RL for continuous control, all applied to indoor autonomy and field robotics. David has been awarded the CIPPRS Award for Service to the Canadian Computer Vision community in 2017, he served as co-chair of the Computer and Robot Vision conference and was local arrangements chair of ICRA 2019.