Skip to main content
  • CIS
    Members: Free
    IEEE Members: Free
    Non-members: Free
    Length: 02:01:18
21 Aug 2022

Junliang Xing (Tsinghua University, China) and Kai Li (Institute of Automation, Chinese Academy of Science, China), Abstract: In recent years, many breakthroughs have been made in artificial intelligence, from playing Atari games to learning complex robotic manipulation tasks. However, an agent still faces challenges from sparse reward and imperfect information in many real-world scenarios. Sparse reward refers to a reward function that is zero in most of its domain and only gives positive values to very few states. It is difficult for an agent to learn effective policies in sparse reward games since it will not get any feedback about whether its instantaneous actions are good or bad. Moreover, the presence of imperfect information in games, where an agent does not fully know the state of the world, makes learning more difficult since learning in such imperfect information games requires reasoning under uncertainty about other agents� private information.
This tutorial will focus on commonly used approaches for learning in sparse reward and imperfect information games. The first part of the tutorial will discuss learning in sparse reward games, in which we will cover prediction error-based, novelty-based, and information gain-based methods. We will also introduce the latest results of our research group, such as self-navigation-based, potential-based, and influence-based learning algorithms. The second part of the tutorial will focus on learning in imperfect information games. We will first introduce the formal definition of imperfect information games. Then we will discuss regret-based (CFR) and population-based methods (PSRO) to learn fixed optimal worst-case policies. Finally, we will introduce opponent modeling-based methods to learn adaptive policies. We will also discuss the unsolved problems and the directions for future research. We hope this tutorial inspires and motivates attendees to continue learning and contributing to the current development in this field.