3D computer vision and visual data analysis
Field | Value | Language |
dc.contributor.author | Yu, Jianhui | |
dc.date.accessioned | 2024-03-05T02:53:19Z | |
dc.date.available | 2024-03-05T02:53:19Z | |
dc.date.issued | 2024 | en_AU |
dc.identifier.uri | https://hdl.handle.net/2123/32300 | |
dc.description | Includes publication | |
dc.description.abstract | This thesis delves into the domain of 3D data analysis, an area of immense significance in fields including computer graphics, virtual reality, and medical imaging. While 2D data has been extensively studied in computer vision, 3D data introduces an additional layer of complexity, either due to an added spatial dimension or a temporal aspect in video data. This research focuses on three forms of 3D data: point clouds, human meshes, and face videos. In point cloud analysis, we focus on key tasks including classification, segmentation, and semantic segmentation. We first investigate medical point clouds, where we propose a transformer-based model with a novel attention mechanism and a graph reasoning module for classification and segmentation tasks. We also introduce a method for rotation-invariant feature learning, improving analysis robustness and computational efficiency. Moving to 3D human modeling, our work explores text-guided human texture generation. Traditional 3D modeling techniques often fall short in capturing the nuanced textural details of human models. We use a deep learning framework, combining diffusion generative models with physically based rendering and a 3D coordinate network. This method generates high-quality textures and ensures they align semantically with input texts. In the realm of face video data, we begin by proposing a generative adversarial network pipeline for synthesizing faces and predicting micro-expression labels. We also introduce a large-scale face video dataset, complete with textual descriptions, and present a novel text-to-face generation model using bidirectional transformers and an innovative video token technique. Our experiments demonstrate both the superiority of our method and the high-quality face dataset. Overall, this thesis contributes significantly to 3D data processing, showing great potential in point cloud analysis, 3D human modeling, and face video processing, promising research and practical advancements. | en_AU |
dc.language.iso | en | en_AU |
dc.subject | 3D data analysis | en_AU |
dc.subject | generative modeling | en_AU |
dc.subject | point clouds | en_AU |
dc.title | 3D computer vision and visual data analysis | en_AU |
dc.type | Thesis | |
dc.type.thesis | Doctor of Philosophy | en_AU |
dc.rights.other | The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission. | en_AU |
usyd.faculty | SeS faculties schools::Faculty of Engineering::School of Computer Science | en_AU |
usyd.degree | Doctor of Philosophy Ph.D. | en_AU |
usyd.awardinginst | The University of Sydney | en_AU |
usyd.advisor | Cai, Weidong | |
usyd.include.pub | Yes | en_AU |
Associated file/s
Associated collections