Show simple item record

FieldValueLanguage
dc.contributor.authorHuang, Di
dc.date.accessioned2025-05-07T06:09:03Z
dc.date.available2025-05-07T06:09:03Z
dc.date.issued2025en_AU
dc.identifier.urihttps://hdl.handle.net/2123/33878
dc.description.abstractThis thesis explores 3D reconstruction and understanding, essential for intelligent systems to perceive and reason about the world. By emphasizing their complementarity, it overcomes precision, scalability, and generalization limits in real-world 3D pipelines, aiming to recover accurate geometry enriched with semantic meaning for robotics, autonomous navigation, and augmented reality. Reconstruction methods must handle occlusions, reflections, and sparse data, while understanding requires capturing both fine-grained details and high-level context across diverse objects and motions. We tackle these linked challenges by jointly optimizing geometric and semantic processes through shared representations, revealing how each can inform and strengthen the other. Four contributions structure this work: a geometry-driven method for high-fidelity monocular reconstruction of hand-held objects without learned priors; Ponder, a point-cloud pretraining paradigm that uses differentiable rendering of RGB-D data to enhance detection, segmentation, and reconstruction; MotionGPT, a multimodal model uniting language and geometry encoders to generate realistic human motion under varied control signals; and Agent3D-Zero, a zero-shot 3D understanding system that iteratively selects viewpoints and synthesizes knowledge from meshes via visual prompts in large language models, eliminating the need for extensive 3D training data. Extensive experiments demonstrate state-of-the-art performance in object reconstruction, semantic segmentation, motion synthesis, and scene understanding. By integrating geometric and semantic reasoning, pretraining strategies, and multimodal cues, this work establishes a unified framework for 3D scene interpretation that advances theoretical boundaries and delivers practical benefits—from digital content creation to human–robot interaction—paving the way for next-generation intelligent systems.en_AU
dc.language.isoenen_AU
dc.subject3D reconstructionen_AU
dc.subject3D scene understandingen_AU
dc.subjectmonocular video reconstructionen_AU
dc.subjectpoint-cloud pretrainingen_AU
dc.subjecthuman motion synthesisen_AU
dc.subjectzero-shot 3D interpretationen_AU
dc.title3D Reconstruction and Understandingen_AU
dc.typeThesis
dc.type.thesisDoctor of Philosophyen_AU
dc.rights.otherThe author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.en_AU
usyd.facultySeS faculties schools::Faculty of Engineering::School of Electrical and Information Engineeringen_AU
usyd.degreeDoctor of Philosophy Ph.D.en_AU
usyd.awardinginstThe University of Sydneyen_AU
usyd.advisorLi, Yonghui


Show simple item record

Associated file/s

Associated collections

Show simple item record

There are no previous versions of the item available.