Augmenting Deep Reinforcement Learning with Clustering
Deep Reinforcement Learning Algorithms today suffer heavily from sample inefficiency – the need for a lot of training samples to learn the desired behavior. Although there are quite a few reasons for this inefficiency, one of them may be due to the same reason that makes Deep RL so powerful: the use of Neural Networks as function approximators. Most Deep RL algorithms use neural networks to predict a policy function, and then through backpropagation attempt to shift the policy predictions towards an optimal policy. However, it is difficult to control what region of the input space benefits from learning on each training example. If too small a region benefits, then more training samples from the disaffected regions would be required to learn essentially the same information. This drives up the total number of required training samples. This work aims to tackle the problem by enforcing the neural network to form a minimal set of clusters over the input space, such that what is learned for one training sample in a cluster is more widely distributed – not in an unknown region around the training example as the neural network would do on its own – but throughout the entire cluster. Specifically, the Proximal Policy Optimization with RNN policy algorithm will be augmented with clusters, and it will be shown that this produces better results than the vanilla PPO.