Robust Off-Policy Deep Reinforcement Learning
| Field | Value | Language |
| dc.contributor.author | Bawa, Payal | |
| dc.date.accessioned | 2023-09-05T03:05:15Z | |
| dc.date.available | 2023-09-05T03:05:15Z | |
| dc.date.issued | 2023 | en |
| dc.identifier.uri | https://hdl.handle.net/2123/31647 | |
| dc.description.abstract | Significant progress in Deep Learning, has helped Deep Reinforcement Learning (RL) algorithms achieve human like performance across applications ranging from games like Go and chess to simple robotic tasks. Off-policy Deep RL algorithms in particular have shown promising results on a wide range of simulated tasks. However, they are encumbered by stability concerns thus preventing their real-world deployment. This thesis makes several contributions toward developing off-policy deep RL algorithms that are robust, scalable, sample efficient and suitable for safety critical applications. Our first contribution is Bagged Critic for Continuous Control (BC3). BC3 mitigates overestimation bias in off-policy actor-critic algorithms by employing an ensemble of state-value functions. Our second contribution is Spctral Normalized Actor Critic (SNAC). SNAC bounds the Lipschitz constants of the actor-critic networks in off-policy algorithms which in return bound the gradients flowing the network. Bounded gradients help RL agorithms learn more robust and sample efficient policies. Our last contribution is orthogonality constrained actor critic algorithms. Enforcing orthogonality on the weight matrices of actor critic networks helps preserve the norm of the gradients thus preventing vanishing gradients and avoiding convergence to suboptimal polices. | en |
| dc.language.iso | en | en |
| dc.subject | Deep Reinforcement Learning | en |
| dc.subject | Deep Learning | en |
| dc.subject | Reinforcement learning | en |
| dc.title | Robust Off-Policy Deep Reinforcement Learning | en |
| dc.type | Thesis | |
| dc.type.thesis | Doctor of Philosophy | en |
| dc.rights.other | The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission. | en |
| usyd.faculty | SeS faculties schools::Faculty of Engineering::School of Civil Engineering | en |
| usyd.degree | Doctor of Philosophy Ph.D. | en |
| usyd.awardinginst | The University of Sydney | en |
| usyd.advisor | Ramos, Fabio |
Associated file/s
Associated collections