3D Reconstruction and Understanding

Huang, Di

Access status:

USyd Access

Field	Value	Language
dc.contributor.author	Huang, Di
dc.date.accessioned	2025-05-07T06:09:03Z
dc.date.available	2025-05-07T06:09:03Z
dc.date.issued	2025	en_AU
dc.identifier.uri	https://hdl.handle.net/2123/33878
dc.description.abstract	This thesis explores 3D reconstruction and understanding, essential for intelligent systems to perceive and reason about the world. By emphasizing their complementarity, it overcomes precision, scalability, and generalization limits in real-world 3D pipelines, aiming to recover accurate geometry enriched with semantic meaning for robotics, autonomous navigation, and augmented reality. Reconstruction methods must handle occlusions, reflections, and sparse data, while understanding requires capturing both fine-grained details and high-level context across diverse objects and motions. We tackle these linked challenges by jointly optimizing geometric and semantic processes through shared representations, revealing how each can inform and strengthen the other. Four contributions structure this work: a geometry-driven method for high-fidelity monocular reconstruction of hand-held objects without learned priors; Ponder, a point-cloud pretraining paradigm that uses differentiable rendering of RGB-D data to enhance detection, segmentation, and reconstruction; MotionGPT, a multimodal model uniting language and geometry encoders to generate realistic human motion under varied control signals; and Agent3D-Zero, a zero-shot 3D understanding system that iteratively selects viewpoints and synthesizes knowledge from meshes via visual prompts in large language models, eliminating the need for extensive 3D training data. Extensive experiments demonstrate state-of-the-art performance in object reconstruction, semantic segmentation, motion synthesis, and scene understanding. By integrating geometric and semantic reasoning, pretraining strategies, and multimodal cues, this work establishes a unified framework for 3D scene interpretation that advances theoretical boundaries and delivers practical benefits—from digital content creation to human–robot interaction—paving the way for next-generation intelligent systems.	en_AU
dc.language.iso	en	en_AU
dc.subject	3D reconstruction	en_AU
dc.subject	3D scene understanding	en_AU
dc.subject	monocular video reconstruction	en_AU
dc.subject	point-cloud pretraining	en_AU
dc.subject	human motion synthesis	en_AU
dc.subject	zero-shot 3D interpretation	en_AU
dc.title	3D Reconstruction and Understanding	en_AU
dc.type	Thesis
dc.type.thesis	Doctor of Philosophy	en_AU
dc.rights.other	The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.	en_AU
usyd.faculty	SeS faculties schools::Faculty of Engineering::School of Electrical and Information Engineering	en_AU
usyd.degree	Doctor of Philosophy Ph.D.	en_AU
usyd.awardinginst	The University of Sydney	en_AU
usyd.advisor	Li, Yonghui