Hybrid Methods for Efficient Exploration in Reinforcement Learning

Blau, Tom

Access status:

Open Access

Field	Value	Language
dc.contributor.author	Blau, Tom
dc.date.accessioned	2020-09-25
dc.date.available	2020-09-25
dc.date.issued	2020	en_AU
dc.identifier.uri	https://hdl.handle.net/2123/23476
dc.description.abstract	Reinforcement learning is a powerful approach for learning control policies that solve sequential decision problems under unknown dynamics, such as robotic locomotion, object manipulation, and autonomous driving. The drawback of RL algorithms is that they have poor data efficiency. The number of data points required to learn a high-performing control policy is often in the millions, even for comparatively simple tasks. This thesis investigates the sources of inefficiency in reinforcement learning as well as ways to mitigate said inefficiencies. The focus in particular is on problems with continuous controls, high-dimensional observations, and sparse rewards. The contributions of this thesis are the following: First, the thesis introduces a method to efficiently learn a control policy by imitating a small set of demonstrations from an expert, using variational inference to regularize learning. It is shown experimentally that the variational regularisation results in better initial policies that converge more quickly during fine-tuning, compared with standard learning-from-demonstration. Second, the thesis provides a method to efficiently generate demonstrations when an expert is not available, using algorithms from the field of planning to explore the state space more efficiently than state-of-the-art RL can. Theoretical results are provided that guarantee a demonstration can be found in finite time, and bound the asymptotic time complexity of finding one. Experimental results show that the method greatly improves overall data efficiency. Finally, the thesis presents an approach for improving exploration by modifying the reinforcement signal to discourage revisiting previously explored states. Bayesian linear regression is used to maintain a model with uncertainty information, and the agent is rewarded for exploring regions where uncertainty is high. Experiments show that the approach improves data efficiency in both simulated and physical environments.	en_AU
dc.language.iso	en	en_AU
dc.publisher	University of Sydney	en_AU
dc.subject	reinforcement learning	en_AU
dc.subject	planning	en_AU
dc.subject	robotics	en_AU
dc.subject	Bayesian methods	en_AU
dc.title	Hybrid Methods for Efficient Exploration in Reinforcement Learning	en_AU
dc.type	Thesis
dc.type.thesis	Doctor of Philosophy	en_AU
dc.rights.other	The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.	en_AU
usyd.faculty	SeS faculties schools::Faculty of Engineering::School of Computer Science	en_AU
usyd.degree	Doctor of Philosophy Ph.D.	en_AU
usyd.awardinginst	The University of Sydney	en_AU
usyd.advisor	Ramos, Fabio