Show simple item record

FieldValueLanguage
dc.contributor.authorLei, Shiye
dc.date.accessioned2026-06-23T05:53:17Z
dc.date.available2026-06-23T05:53:17Z
dc.date.issued2026en_AU
dc.identifier.urihttps://hdl.handle.net/2123/35446
dc.descriptionIncludes publication
dc.description.abstractOffline reinforcement learning (offline RL) provides a principled framework for learning decision-making policies from fixed datasets without environment interaction, enabling applications in safety-critical, privacy-sensitive, and resource-constrained settings. However, modern offline RL systems often rely on large-scale datasets collected from suboptimal policies, leading to substantial computational overhead and limited scalability. Improving data efficiency is therefore critical for making offline RL practically viable. In this thesis, we develop dataset compression algorithms for offline RL that explicitly account for these intrinsic data properties. From the action perspective, we establish a theoretical equivalence between the policy performance gap and an action-value-weighted decision discrepancy. This insight motivates an action-value-weighted objective for offline behavior distillation (OBD), which distills large offline RL datasets into compact synthetic training sets. From the state perspective, we identify state diversity as a key factor governing the effectiveness of offline behavior distillation. We show that insufficient state coverage in the original dataset limits policy performance after compression. To address this issue, we propose state-weighted OBD, which explicitly incorporates state diversity into the distillation objective and significantly improves robustness to dataset compression. Finally, by jointly considering action-value information, state density, and trajectory-level sequential structure, we propose stepwise dual ranking (SDR), a simple and scalable coreset selection algorithm that constructs compact yet informative subsets from large offline behavioral datasets without additional training overhead.en_AU
dc.language.isoenen_AU
dc.subjectData Compressionen_AU
dc.subjectDataset Distillationen_AU
dc.subjectOffline Reinforcement Learningen_AU
dc.titleBehavioral Dataset Compression for Efficient Reinforcement Learningen_AU
dc.typeThesis
dc.type.thesisDoctor of Philosophyen_AU
dc.rights.otherThe author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.en
usyd.facultySeS faculties schools::Faculty of Engineering::School of Computer Scienceen_AU
usyd.degreeDoctor of Philosophy Ph.D.en_AU
usyd.awardinginstThe University of Sydneyen_AU
usyd.advisorLiu, Tongliang
usyd.include.pubYesen_AU


Show simple item record

Associated file/s

Associated collections

Show simple item record

There are no previous versions of the item available.