Show simple item record

FieldValueLanguage
dc.contributor.authorMorere, Philippe
dc.date.accessioned2019-10-18
dc.date.available2019-10-18
dc.date.issued2019-01-31
dc.identifier.urihttps://hdl.handle.net/2123/21230
dc.description.abstractThis thesis addresses the problem of achieving efficient non-myopic decision making by explicitly balancing exploration and exploitation. Decision making, both in planning and reinforcement learning (RL), enables agents or robots to complete tasks by acting on their environments. Complexity arises when completing objectives requires sacrificing short-term performance in order to achieve better long-term performance. Decision making algorithms with this characteristic are known as non-myopic, and require long sequences of actions to be evaluated, thereby greatly increasing the search space size. Optimal behaviours need balance two key quantities: exploration and exploitation. Exploitation takes advantage of previously acquired information or high performing solutions, whereas exploration focuses on acquiring more informative data. The balance between these quantities is crucial in both RL and planning. This thesis brings the following contributions: Firstly, a reward function trading off exploration and exploitation of gradients for sequential planning is proposed. It is based on Bayesian optimisation (BO) and is combined to a non-myopic planner to achieve efficient spatial monitoring. Secondly, the algorithm is extended to continuous actions spaces, called continuous belief tree search (CBTS), and uses BO to dynamically sample actions within a tree search, balancing high-performing actions and novelty. Finally, the framework is extended to RL, for which a multi-objective methodology for explicit exploration and exploitation balance is proposed. The two objectives are modelled explicitly and balanced at a policy level, as in BO. This allows for online exploration strategies, as well as a data-efficient model-free RL algorithm achieving exploration by minimising the uncertainty of Q-values (EMU-Q). The proposed algorithms are evaluated on different simulated and real-world robotics problems, displaying superior performance in terms of sample efficiency and exploration.en
dc.rightsThe author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.en
dc.rightsThe author retains copyright of this thesis
dc.subjectReinforcement Learningen
dc.subjectExplorationen
dc.subjectPlanningen
dc.subjectPOMDPen
dc.subjectBayesianen
dc.subjectUncertainityen
dc.titleBayesian Optimisation for Planning And Reinforcement Learningen
dc.typeThesisen
dc.type.thesisDoctor of Philosophyen
usyd.facultyFaculty of Engineering, School of Computer Scienceen
usyd.degreeDoctor of Philosophy Ph.D.en
usyd.awardinginstThe University of Sydneyen


Show simple item record

Associated file/s

Associated collections

Show simple item record

There are no previous versions of the item available.