Robust Off-Policy Deep Reinforcement Learning

Bawa, Payal

Access status:

USyd Access

Field	Value	Language
dc.contributor.author	Bawa, Payal
dc.date.accessioned	2023-09-05T03:05:15Z
dc.date.available	2023-09-05T03:05:15Z
dc.date.issued	2023	en
dc.identifier.uri	https://hdl.handle.net/2123/31647
dc.description.abstract	Significant progress in Deep Learning, has helped Deep Reinforcement Learning (RL) algorithms achieve human like performance across applications ranging from games like Go and chess to simple robotic tasks. Off-policy Deep RL algorithms in particular have shown promising results on a wide range of simulated tasks. However, they are encumbered by stability concerns thus preventing their real-world deployment. This thesis makes several contributions toward developing off-policy deep RL algorithms that are robust, scalable, sample efficient and suitable for safety critical applications. Our first contribution is Bagged Critic for Continuous Control (BC3). BC3 mitigates overestimation bias in off-policy actor-critic algorithms by employing an ensemble of state-value functions. Our second contribution is Spctral Normalized Actor Critic (SNAC). SNAC bounds the Lipschitz constants of the actor-critic networks in off-policy algorithms which in return bound the gradients flowing the network. Bounded gradients help RL agorithms learn more robust and sample efficient policies. Our last contribution is orthogonality constrained actor critic algorithms. Enforcing orthogonality on the weight matrices of actor critic networks helps preserve the norm of the gradients thus preventing vanishing gradients and avoiding convergence to suboptimal polices.	en
dc.language.iso	en	en
dc.subject	Deep Reinforcement Learning	en
dc.subject	Deep Learning	en
dc.subject	Reinforcement learning	en
dc.title	Robust Off-Policy Deep Reinforcement Learning	en
dc.type	Thesis
dc.type.thesis	Doctor of Philosophy	en
dc.rights.other	The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.	en
usyd.faculty	SeS faculties schools::Faculty of Engineering::School of Civil Engineering	en
usyd.degree	Doctor of Philosophy Ph.D.	en
usyd.awardinginst	The University of Sydney	en
usyd.advisor	Ramos, Fabio