-
CIS
IEEE Members: Free
Non-members: FreeLength: 01:17:42
Reinforcement learning is expected to play an important role in our AI and machine learning era, this is evident by latest major advances, particularly in games. This is due to its flexibility and arguably minimum designer intervention especially when the feature extraction process is left to a robust model such as a deep neural network. Although deep learning alleviated the long-standing burden of manual feature design, another important issue remains to be tackled, that is the experience-hungry nature of RL models which is mainly due to bootstrapping and exploration. One important technique that will play a centre stage role in tackling this issue is experience replay. Naturally, it allows us to capitalise on the already gained experience and to shorten the time needed to train an RL agent. The frequency and depth of the replay can vary significantly and currently a unifying view and a clear understanding of the issues related to off-policy and on-policy replay is generally lacking. For example, on the far end of the spectrum, extensive experience-replay, although should conceivably help reduce the data-intensity of the training period, when done naively, put significant constrains on the practicality of the model and requires both extra time and space that can grow significantly; relegating the method impractical. On the other hand, in its optimal form, whether it is a target re-evaluation or a re-update, when importance sampling ratio uses bootstrapping, the methods computational requirements matches other model based RL methods for planning. In this tutorial we will be tackling the issues and techniques related to the theory and application of deep reinforcement learning and experience replay, and how and when these techniques can be applied effectively to produce a robust model. In addition, we will promote a unified view of experience replay that involves replaying and re-evaluation of the target updates. What is more, we will show that the generalised intensive experience replay method can be used to derive several important algorithms as special cases of other methods including n-steps true online TD and LSTD. This surprising but important view can help immensely the neuro-dynamic/RL community to move this concept further forward and will benefit both the researchers and practitioners in their quest for a better and more practical RL methods and models.