Behavioral Dataset Compression for Efficient Reinforcement Learning
Access status:
USyd Access
Type
ThesisThesis type
Doctor of PhilosophyAuthor/s
Lei, ShiyeAbstract
Offline reinforcement learning (offline RL) provides a principled framework for learning decision-making policies from fixed datasets without environment interaction, enabling applications in safety-critical, privacy-sensitive, and resource-constrained settings. However, modern ...
See moreOffline reinforcement learning (offline RL) provides a principled framework for learning decision-making policies from fixed datasets without environment interaction, enabling applications in safety-critical, privacy-sensitive, and resource-constrained settings. However, modern offline RL systems often rely on large-scale datasets collected from suboptimal policies, leading to substantial computational overhead and limited scalability. Improving data efficiency is therefore critical for making offline RL practically viable. In this thesis, we develop dataset compression algorithms for offline RL that explicitly account for these intrinsic data properties. From the action perspective, we establish a theoretical equivalence between the policy performance gap and an action-value-weighted decision discrepancy. This insight motivates an action-value-weighted objective for offline behavior distillation (OBD), which distills large offline RL datasets into compact synthetic training sets. From the state perspective, we identify state diversity as a key factor governing the effectiveness of offline behavior distillation. We show that insufficient state coverage in the original dataset limits policy performance after compression. To address this issue, we propose state-weighted OBD, which explicitly incorporates state diversity into the distillation objective and significantly improves robustness to dataset compression. Finally, by jointly considering action-value information, state density, and trajectory-level sequential structure, we propose stepwise dual ranking (SDR), a simple and scalable coreset selection algorithm that constructs compact yet informative subsets from large offline behavioral datasets without additional training overhead.
See less
See moreOffline reinforcement learning (offline RL) provides a principled framework for learning decision-making policies from fixed datasets without environment interaction, enabling applications in safety-critical, privacy-sensitive, and resource-constrained settings. However, modern offline RL systems often rely on large-scale datasets collected from suboptimal policies, leading to substantial computational overhead and limited scalability. Improving data efficiency is therefore critical for making offline RL practically viable. In this thesis, we develop dataset compression algorithms for offline RL that explicitly account for these intrinsic data properties. From the action perspective, we establish a theoretical equivalence between the policy performance gap and an action-value-weighted decision discrepancy. This insight motivates an action-value-weighted objective for offline behavior distillation (OBD), which distills large offline RL datasets into compact synthetic training sets. From the state perspective, we identify state diversity as a key factor governing the effectiveness of offline behavior distillation. We show that insufficient state coverage in the original dataset limits policy performance after compression. To address this issue, we propose state-weighted OBD, which explicitly incorporates state diversity into the distillation objective and significantly improves robustness to dataset compression. Finally, by jointly considering action-value information, state density, and trajectory-level sequential structure, we propose stepwise dual ranking (SDR), a simple and scalable coreset selection algorithm that constructs compact yet informative subsets from large offline behavioral datasets without additional training overhead.
See less
Date
2026Rights statement
The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.Faculty/School
Faculty of Engineering, School of Computer ScienceAwarding institution
The University of SydneyShare