Tutorial: Solving Zero-sum Games through Reinforcement Learning

Yaodong Yang

DOI

CIS

Members: Free
IEEE Members: Free
Non-members: Free

Length: 02:24:21

21 Aug 2022

Yaodong Yang (Peking University, China), Abstract: Recent advances in multiagent reinforcement learning have seen the introduction of a new learning paradigm that revolves around population-based training. The idea is to consider the structure of games not at the micro-level of individual actions, but at the meta-level of the which agent to train against for any given game or situation. A typical framework of population based training is Policy Space Response Oracle (PSRO) method where, at each iteration, a new Reinforcement Learning agent is discovered as the best response to a Nash mixture of agents from the opponent populations. PSRO methods can provably converge to Nash, correlated and coarse correlated equilibria in N-player games; particularly, they have showed remarkable performance on solving large-scale zero-sum games. In this tutorial, I will introduce the basic idea of PSRO methods, the necessity of using PSRO methods in solving real-world games such as Chess, the recent results on solving N-player games and mean-field games, how to promote behavioral diversity during training, and the relationship of PSRO method to the conventional no-regret methods. At last, I will introduce a new meta-PSRO framework named Neural Auto-Curricula where we make AI learning to learn a PSRO-like solution algorithm purely from data, and a new PSRO framework called online double oracle that inherits the benefits from both population-based methods and no-regret methods.

Tags:

cog 2022

reinforcement learning

Solving

Yaodong Yang

Tutorial: Solving Zero-sum Games through Reinforcement Learning

Yaodong Yang

More Like This

IEEE CIS Newsletter, Issue 143, December 2024.pdf

IEEE CIS Newsletter, Issue 142, November 2024.pdf

IEEE CIS Newsletter, Issue 141, October 2024.pdf

Become a CIS Member