Show simple item record

FieldValueLanguage
dc.contributor.authorZhao, Runkai
dc.date.accessioned2025-11-03T04:36:18Z
dc.date.available2025-11-03T04:36:18Z
dc.date.issued2025en
dc.identifier.urihttps://hdl.handle.net/2123/34461
dc.descriptionIncludes publication
dc.description.abstract3D realistic perception refers to the faithful ego-centric understanding of real-word object attributes and their precise spatial localization, while maintaining geometric and semantic consistency. In contrast to purely semantic 2D perception, it provides spatially grounded representations—such as bird’s-eye-view (BEV) maps, 3D curves, occupancy, and object trajectories—that are actionable for planning and control. Such perception can be derived from a variety of sensing modalities, including LiDAR, cameras, or multi-modal fusion approaches. Despite these advances, the field is impeded with inherent challenges such as occlusion, long-range sparsity, adverse weather conditions, and domain shifts. In safety-critical applications like autonomous driving and embodied intelligence, the success of 3D realistic perception ultimately depends on achieving high metric accuracy, robustness, and computational efficiency. This thesis advances learning-based 3D perception for lane-line and dynamic-object understanding in challenging driving environments. To address the scarcity and annotation burden of LiDAR data, it introduces LiSV-3DLane, the first large-scale surround-view 3D lane dataset with enriched semantic annotations. Based on this resource, LiLaDet projects LiDAR geometry into a BEV representation for precise 3D lane recovery and broader applicability of LiDAR-based perception. To reduce the high cost of point-cloud processing, LaneCMKT transfers 3D cues from a LiDAR teacher to a monocular image student via cross-modal distillation, improving detection robustness under adverse conditions. Finally, BeXT (Bringing eXpertises Together) integrates complementary Visual Foundation Models (VFM) into a lightweight monocular encoder through expertise adapter pretraining and dynamic feature routing. Collectively, these contributions establish a pathway from LiDAR to scalable monocular deployment, enabling robust, efficient, and metrically faithful 3D realistic perception.en
dc.language.isoenen
dc.subjectArtificial Intelligenceen
dc.subjectDeep Learningen
dc.subject3D Computer Visionen
dc.subjectMulti-channel Data Understandingen
dc.subjectAutonomous Drivingen
dc.subject3D Realistic Perceptionen
dc.titleAdvanced Learning-Based Approaches for 3D Realistic Perceptionen
dc.typeThesis
dc.type.thesisDoctor of Philosophyen
dc.rights.otherThe author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.en
usyd.facultySeS faculties schools::Faculty of Engineering::School of Computer Scienceen
usyd.degreeDoctor of Philosophy Ph.D.en
usyd.awardinginstThe University of Sydneyen
usyd.advisorCai, Tom
usyd.include.pubYesen


Show simple item record

Associated file/s

Associated collections

Show simple item record

There are no previous versions of the item available.