Bayesian Optimisation for Planning And Reinforcement Learning

Morere, Philippe

Access status:

Open Access

Field	Value	Language
dc.contributor.author	Morere, Philippe
dc.date.accessioned	2019-10-18
dc.date.available	2019-10-18
dc.date.issued	2019-01-31
dc.identifier.uri	https://hdl.handle.net/2123/21230
dc.description.abstract	This thesis addresses the problem of achieving efficient non-myopic decision making by explicitly balancing exploration and exploitation. Decision making, both in planning and reinforcement learning (RL), enables agents or robots to complete tasks by acting on their environments. Complexity arises when completing objectives requires sacrificing short-term performance in order to achieve better long-term performance. Decision making algorithms with this characteristic are known as non-myopic, and require long sequences of actions to be evaluated, thereby greatly increasing the search space size. Optimal behaviours need balance two key quantities: exploration and exploitation. Exploitation takes advantage of previously acquired information or high performing solutions, whereas exploration focuses on acquiring more informative data. The balance between these quantities is crucial in both RL and planning. This thesis brings the following contributions: Firstly, a reward function trading off exploration and exploitation of gradients for sequential planning is proposed. It is based on Bayesian optimisation (BO) and is combined to a non-myopic planner to achieve efficient spatial monitoring. Secondly, the algorithm is extended to continuous actions spaces, called continuous belief tree search (CBTS), and uses BO to dynamically sample actions within a tree search, balancing high-performing actions and novelty. Finally, the framework is extended to RL, for which a multi-objective methodology for explicit exploration and exploitation balance is proposed. The two objectives are modelled explicitly and balanced at a policy level, as in BO. This allows for online exploration strategies, as well as a data-efficient model-free RL algorithm achieving exploration by minimising the uncertainty of Q-values (EMU-Q). The proposed algorithms are evaluated on different simulated and real-world robotics problems, displaying superior performance in terms of sample efficiency and exploration.	en
dc.rights	The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.	en
dc.rights	The author retains copyright of this thesis
dc.subject	Reinforcement Learning	en
dc.subject	Exploration	en
dc.subject	Planning	en
dc.subject	POMDP	en
dc.subject	Bayesian	en
dc.subject	Uncertainity	en
dc.title	Bayesian Optimisation for Planning And Reinforcement Learning	en
dc.type	Thesis	en
dc.type.thesis	Doctor of Philosophy	en
usyd.faculty	Faculty of Engineering, School of Computer Science	en
usyd.degree	Doctor of Philosophy Ph.D.	en
usyd.awardinginst	The University of Sydney	en