IEEE Members: Free
Non-members: FreeDuration: 02:24:21
Yaodong Yang (Peking University, China), Abstract: Recent advances in multiagent reinforcement learning have seen the introduction of a new learning paradigm that revolves around population-based training. The idea is to consider the structure of games not at the micro-level of individual actions, but at the meta-level of the which agent to train against for any given game or situation. A typical framework of population based training is Policy Space Response Oracle (PSRO) method where, at each iteration, a new Reinforcement Learning agent is discovered as the best response to a Nash mixture of agents from the opponent populations. PSRO methods can provably converge to Nash, correlated and coarse correlated equilibria in N-player games; particularly, they have showed remarkable performance on solving large-scale zero-sum games. In this tutorial, I will introduce the basic idea of PSRO methods, the necessity of using PSRO methods in solving real-world games such as Chess, the recent results on solving N-player games and mean-field games, how to promote behavioral diversity during training, and the relationship of PSRO method to the conventional no-regret methods. At last, I will introduce a new meta-PSRO framework named Neural Auto-Curricula where we make AI learning to learn a PSRO-like solution algorithm purely from data, and a new PSRO framework called online double oracle that inherits the benefits from both population-based methods and no-regret methods.