Combining Actor-Critic Methods with Model Predictive Control via Stein Variational Inference
Access status:
USyd Access
Type
ThesisThesis type
Masters by ResearchAuthor/s
Cai, ShizheAbstract
Deep Reinforcement Learning (DRL) has demonstrated remarkable success in continuous control tasks. However, it often requires extensive training data, struggles with complex long-horizon planning, and may fail to maintain safety constraints during operation. Meanwhile, Model ...
See moreDeep Reinforcement Learning (DRL) has demonstrated remarkable success in continuous control tasks. However, it often requires extensive training data, struggles with complex long-horizon planning, and may fail to maintain safety constraints during operation. Meanwhile, Model Predictive Control (MPC) provides explainability and constraint satisfaction but typically leads to only locally optimal solutions and demands careful manual design of cost functions. To address these complementary limitations, this thesis develops and validates Q-guided Stein variational model pre- dictive Actor-Critic (Q-STAC), a novel framework that bridges these approaches by integrating Bayesian Model Predictive Control (Bayesian MPC) with actor-critic reinforcement learning through Stein Variational Gradient Descent (SVGD). A core innovation within this framework is the direct optimization of control sequences using learned Q-values as objectives, an approach that eliminates the need for explicit cost function design while leveraging the dynamics of the system to improve sample efficiency and forces that control signals remain within safe boundaries. Extensive experiments on 2D navigation, robotic manipulation tasks and real-world picking task demonstrate that Q-STAC achieves superior sample efficiency, robustness, and optimality compared to State-of-the-Art (SOTA) algorithms.
See less
See moreDeep Reinforcement Learning (DRL) has demonstrated remarkable success in continuous control tasks. However, it often requires extensive training data, struggles with complex long-horizon planning, and may fail to maintain safety constraints during operation. Meanwhile, Model Predictive Control (MPC) provides explainability and constraint satisfaction but typically leads to only locally optimal solutions and demands careful manual design of cost functions. To address these complementary limitations, this thesis develops and validates Q-guided Stein variational model pre- dictive Actor-Critic (Q-STAC), a novel framework that bridges these approaches by integrating Bayesian Model Predictive Control (Bayesian MPC) with actor-critic reinforcement learning through Stein Variational Gradient Descent (SVGD). A core innovation within this framework is the direct optimization of control sequences using learned Q-values as objectives, an approach that eliminates the need for explicit cost function design while leveraging the dynamics of the system to improve sample efficiency and forces that control signals remain within safe boundaries. Extensive experiments on 2D navigation, robotic manipulation tasks and real-world picking task demonstrate that Q-STAC achieves superior sample efficiency, robustness, and optimality compared to State-of-the-Art (SOTA) algorithms.
See less
Date
2026Rights statement
The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.Faculty/School
Faculty of Engineering, School of Computer ScienceAwarding institution
The University of SydneyShare