Agile Flight Emerges from Multi-Agent Competitive Racing
Through multi-agent competition and the sparse high-level objective of winning a race, we find that both agile flight and strategy emerge from agents trained with reinforcement learning. We provide evidence that this approach outperforms the common paradigm of training agents in isolation with rewards that prescribe behavior, e.g., progress on the raceline. Moreover, we find that multi-agent competition yields policies that transfer more reliably to the real world than policies trained with a single-agent progress-based reward, despite the two methods using the same simulation environment, randomization strategy, and hardware. Link