Show simple item record

FieldValueLanguage
dc.contributor.authorBlau, Tom
dc.date.accessioned2020-09-25
dc.date.available2020-09-25
dc.date.issued2020en_AU
dc.identifier.urihttps://hdl.handle.net/2123/23476
dc.description.abstractReinforcement learning is a powerful approach for learning control policies that solve sequential decision problems under unknown dynamics, such as robotic locomotion, object manipulation, and autonomous driving. The drawback of RL algorithms is that they have poor data efficiency. The number of data points required to learn a high-performing control policy is often in the millions, even for comparatively simple tasks. This thesis investigates the sources of inefficiency in reinforcement learning as well as ways to mitigate said inefficiencies. The focus in particular is on problems with continuous controls, high-dimensional observations, and sparse rewards. The contributions of this thesis are the following: First, the thesis introduces a method to efficiently learn a control policy by imitating a small set of demonstrations from an expert, using variational inference to regularize learning. It is shown experimentally that the variational regularisation results in better initial policies that converge more quickly during fine-tuning, compared with standard learning-from-demonstration. Second, the thesis provides a method to efficiently generate demonstrations when an expert is not available, using algorithms from the field of planning to explore the state space more efficiently than state-of-the-art RL can. Theoretical results are provided that guarantee a demonstration can be found in finite time, and bound the asymptotic time complexity of finding one. Experimental results show that the method greatly improves overall data efficiency. Finally, the thesis presents an approach for improving exploration by modifying the reinforcement signal to discourage revisiting previously explored states. Bayesian linear regression is used to maintain a model with uncertainty information, and the agent is rewarded for exploring regions where uncertainty is high. Experiments show that the approach improves data efficiency in both simulated and physical environments.en_AU
dc.language.isoenen_AU
dc.publisherUniversity of Sydneyen_AU
dc.subjectreinforcement learningen_AU
dc.subjectplanningen_AU
dc.subjectroboticsen_AU
dc.subjectBayesian methodsen_AU
dc.titleHybrid Methods for Efficient Exploration in Reinforcement Learningen_AU
dc.typeThesis
dc.type.thesisDoctor of Philosophyen_AU
dc.rights.otherThe author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.en_AU
usyd.facultySeS faculties schools::Faculty of Engineering::School of Computer Scienceen_AU
usyd.degreeDoctor of Philosophy Ph.D.en_AU
usyd.awardinginstThe University of Sydneyen_AU
usyd.advisorRamos, Fabio


Show simple item record

Associated file/s

Associated collections

Show simple item record

There are no previous versions of the item available.