Combining Actor-Critic Methods with Model Predictive Control via Stein Variational Inference

Cai, Shizhe

Access status:

USyd Access

Field	Value	Language
dc.contributor.author	Cai, Shizhe
dc.date.accessioned	2026-01-22T23:22:26Z
dc.date.available	2026-01-22T23:22:26Z
dc.date.issued	2026	en
dc.identifier.uri	https://hdl.handle.net/2123/34759
dc.description.abstract	Deep Reinforcement Learning (DRL) has demonstrated remarkable success in continuous control tasks. However, it often requires extensive training data, struggles with complex long-horizon planning, and may fail to maintain safety constraints during operation. Meanwhile, Model Predictive Control (MPC) provides explainability and constraint satisfaction but typically leads to only locally optimal solutions and demands careful manual design of cost functions. To address these complementary limitations, this thesis develops and validates Q-guided Stein variational model pre- dictive Actor-Critic (Q-STAC), a novel framework that bridges these approaches by integrating Bayesian Model Predictive Control (Bayesian MPC) with actor-critic reinforcement learning through Stein Variational Gradient Descent (SVGD). A core innovation within this framework is the direct optimization of control sequences using learned Q-values as objectives, an approach that eliminates the need for explicit cost function design while leveraging the dynamics of the system to improve sample efficiency and forces that control signals remain within safe boundaries. Extensive experiments on 2D navigation, robotic manipulation tasks and real-world picking task demonstrate that Q-STAC achieves superior sample efficiency, robustness, and optimality compared to State-of-the-Art (SOTA) algorithms.	en
dc.language.iso	en	en
dc.subject	Reinforcement Learning	en
dc.subject	Model Predictive Control	en
dc.subject	Bayesian Inference	en
dc.subject	Stein Variational Gradient Descent	en
dc.title	Combining Actor-Critic Methods with Model Predictive Control via Stein Variational Inference	en
dc.type	Thesis
dc.type.thesis	Masters by Research	en
dc.rights.other	The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.	en
usyd.faculty	SeS faculties schools::Faculty of Engineering::School of Computer Science	en
usyd.degree	Master of Philosophy M.Phil	en
usyd.awardinginst	The University of Sydney	en
usyd.advisor	Ramos, Fabio
usyd.include.pub	No	en