Advanced Learning-Based Approaches for 3D Realistic Perception

Zhao, Runkai

Access status:

USyd Access

Field	Value	Language
dc.contributor.author	Zhao, Runkai
dc.date.accessioned	2025-11-03T04:36:18Z
dc.date.available	2025-11-03T04:36:18Z
dc.date.issued	2025	en
dc.identifier.uri	https://hdl.handle.net/2123/34461
dc.description	Includes publication
dc.description.abstract	3D realistic perception refers to the faithful ego-centric understanding of real-word object attributes and their precise spatial localization, while maintaining geometric and semantic consistency. In contrast to purely semantic 2D perception, it provides spatially grounded representations—such as bird’s-eye-view (BEV) maps, 3D curves, occupancy, and object trajectories—that are actionable for planning and control. Such perception can be derived from a variety of sensing modalities, including LiDAR, cameras, or multi-modal fusion approaches. Despite these advances, the field is impeded with inherent challenges such as occlusion, long-range sparsity, adverse weather conditions, and domain shifts. In safety-critical applications like autonomous driving and embodied intelligence, the success of 3D realistic perception ultimately depends on achieving high metric accuracy, robustness, and computational efficiency. This thesis advances learning-based 3D perception for lane-line and dynamic-object understanding in challenging driving environments. To address the scarcity and annotation burden of LiDAR data, it introduces LiSV-3DLane, the first large-scale surround-view 3D lane dataset with enriched semantic annotations. Based on this resource, LiLaDet projects LiDAR geometry into a BEV representation for precise 3D lane recovery and broader applicability of LiDAR-based perception. To reduce the high cost of point-cloud processing, LaneCMKT transfers 3D cues from a LiDAR teacher to a monocular image student via cross-modal distillation, improving detection robustness under adverse conditions. Finally, BeXT (Bringing eXpertises Together) integrates complementary Visual Foundation Models (VFM) into a lightweight monocular encoder through expertise adapter pretraining and dynamic feature routing. Collectively, these contributions establish a pathway from LiDAR to scalable monocular deployment, enabling robust, efficient, and metrically faithful 3D realistic perception.	en
dc.language.iso	en	en
dc.subject	Artificial Intelligence	en
dc.subject	Deep Learning	en
dc.subject	3D Computer Vision	en
dc.subject	Multi-channel Data Understanding	en
dc.subject	Autonomous Driving	en
dc.subject	3D Realistic Perception	en
dc.title	Advanced Learning-Based Approaches for 3D Realistic Perception	en
dc.type	Thesis
dc.type.thesis	Doctor of Philosophy	en
dc.rights.other	The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.	en
usyd.faculty	SeS faculties schools::Faculty of Engineering::School of Computer Science	en
usyd.degree	Doctor of Philosophy Ph.D.	en
usyd.awardinginst	The University of Sydney	en
usyd.advisor	Cai, Tom
usyd.include.pub	Yes	en