Hybrid Methods for Efficient Exploration in Reinforcement Learning
Field | Value | Language |
dc.contributor.author | Blau, Tom | |
dc.date.accessioned | 2020-09-25 | |
dc.date.available | 2020-09-25 | |
dc.date.issued | 2020 | en_AU |
dc.identifier.uri | https://hdl.handle.net/2123/23476 | |
dc.description.abstract | Reinforcement learning is a powerful approach for learning control policies that solve sequential decision problems under unknown dynamics, such as robotic locomotion, object manipulation, and autonomous driving. The drawback of RL algorithms is that they have poor data efficiency. The number of data points required to learn a high-performing control policy is often in the millions, even for comparatively simple tasks. This thesis investigates the sources of inefficiency in reinforcement learning as well as ways to mitigate said inefficiencies. The focus in particular is on problems with continuous controls, high-dimensional observations, and sparse rewards. The contributions of this thesis are the following: First, the thesis introduces a method to efficiently learn a control policy by imitating a small set of demonstrations from an expert, using variational inference to regularize learning. It is shown experimentally that the variational regularisation results in better initial policies that converge more quickly during fine-tuning, compared with standard learning-from-demonstration. Second, the thesis provides a method to efficiently generate demonstrations when an expert is not available, using algorithms from the field of planning to explore the state space more efficiently than state-of-the-art RL can. Theoretical results are provided that guarantee a demonstration can be found in finite time, and bound the asymptotic time complexity of finding one. Experimental results show that the method greatly improves overall data efficiency. Finally, the thesis presents an approach for improving exploration by modifying the reinforcement signal to discourage revisiting previously explored states. Bayesian linear regression is used to maintain a model with uncertainty information, and the agent is rewarded for exploring regions where uncertainty is high. Experiments show that the approach improves data efficiency in both simulated and physical environments. | en_AU |
dc.language.iso | en | en_AU |
dc.publisher | University of Sydney | en_AU |
dc.subject | reinforcement learning | en_AU |
dc.subject | planning | en_AU |
dc.subject | robotics | en_AU |
dc.subject | Bayesian methods | en_AU |
dc.title | Hybrid Methods for Efficient Exploration in Reinforcement Learning | en_AU |
dc.type | Thesis | |
dc.type.thesis | Doctor of Philosophy | en_AU |
dc.rights.other | The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission. | en_AU |
usyd.faculty | SeS faculties schools::Faculty of Engineering::School of Computer Science | en_AU |
usyd.degree | Doctor of Philosophy Ph.D. | en_AU |
usyd.awardinginst | The University of Sydney | en_AU |
usyd.advisor | Ramos, Fabio |
Associated file/s
Associated collections