Behavioral Dataset Compression for Efficient Reinforcement Learning

Lei, Shiye

Access status:

USyd Access

Field	Value	Language
dc.contributor.author	Lei, Shiye
dc.date.accessioned	2026-06-23T05:53:17Z
dc.date.available	2026-06-23T05:53:17Z
dc.date.issued	2026	en_AU
dc.identifier.uri	https://hdl.handle.net/2123/35446
dc.description	Includes publication
dc.description.abstract	Offline reinforcement learning (offline RL) provides a principled framework for learning decision-making policies from fixed datasets without environment interaction, enabling applications in safety-critical, privacy-sensitive, and resource-constrained settings. However, modern offline RL systems often rely on large-scale datasets collected from suboptimal policies, leading to substantial computational overhead and limited scalability. Improving data efficiency is therefore critical for making offline RL practically viable. In this thesis, we develop dataset compression algorithms for offline RL that explicitly account for these intrinsic data properties. From the action perspective, we establish a theoretical equivalence between the policy performance gap and an action-value-weighted decision discrepancy. This insight motivates an action-value-weighted objective for offline behavior distillation (OBD), which distills large offline RL datasets into compact synthetic training sets. From the state perspective, we identify state diversity as a key factor governing the effectiveness of offline behavior distillation. We show that insufficient state coverage in the original dataset limits policy performance after compression. To address this issue, we propose state-weighted OBD, which explicitly incorporates state diversity into the distillation objective and significantly improves robustness to dataset compression. Finally, by jointly considering action-value information, state density, and trajectory-level sequential structure, we propose stepwise dual ranking (SDR), a simple and scalable coreset selection algorithm that constructs compact yet informative subsets from large offline behavioral datasets without additional training overhead.	en_AU
dc.language.iso	en	en_AU
dc.subject	Data Compression	en_AU
dc.subject	Dataset Distillation	en_AU
dc.subject	Offline Reinforcement Learning	en_AU
dc.title	Behavioral Dataset Compression for Efficient Reinforcement Learning	en_AU
dc.type	Thesis
dc.type.thesis	Doctor of Philosophy	en_AU
dc.rights.other	The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.	en
usyd.faculty	SeS faculties schools::Faculty of Engineering::School of Computer Science	en_AU
usyd.degree	Doctor of Philosophy Ph.D.	en_AU
usyd.awardinginst	The University of Sydney	en_AU
usyd.advisor	Liu, Tongliang
usyd.include.pub	Yes	en_AU