Deep 3D Information Prediction and Understanding
Access status:
Open Access
Type
ThesisThesis type
Doctor of PhilosophyAuthor/s
Zhao, ShanshanAbstract
3D information prediction and understanding play significant roles in 3D visual perception. For 3D information prediction, recent studies have demonstrated the superiority of deep neural networks. Despite the great success of deep learning, there are still many challenging issues ...
See more3D information prediction and understanding play significant roles in 3D visual perception. For 3D information prediction, recent studies have demonstrated the superiority of deep neural networks. Despite the great success of deep learning, there are still many challenging issues to be solved. One crucial issue is how to learn the deep model in an unsupervised learning framework. In this thesis, we take monocular depth estimation as an example to study this problem through exploring the domain adaptation technique. Apart from the prediction from a single image or multiple images, we can also estimate the depth from multi-modal data, such as RGB image data coupled with 3D laser scan data. Since the 3D data is usually sparse and irregularly distributed, we are required to model the contextual information from the sparse data and fuse the multi-modal features. We examine the issues by studying the depth completion task. For 3D information understanding, such as point clouds analysis, due to the sparsity and unordered property of 3D point cloud, instead of the conventional convolution, new operations which can model the local geometric shape are required. We design a basic operation for point cloud analysis through introducing a novel adaptive edge-to-edge interaction learning module. Besides, due to the diversity in configurations of the 3D laser scanners, the captured 3D data often varies from dataset to dataset in object size, density, and viewpoints. As a result, the domain generalization in 3D data analysis is also a critical problem. We study this issue in 3D shape classification by proposing an entropy regularization term. Through studying four specific tasks, this thesis focuses on several crucial issues in deep 3D information prediction and understanding, including model designing, multi-modal fusion, sparse data analysis, unsupervised learning, domain adaptation, and domain generalization.
See less
See more3D information prediction and understanding play significant roles in 3D visual perception. For 3D information prediction, recent studies have demonstrated the superiority of deep neural networks. Despite the great success of deep learning, there are still many challenging issues to be solved. One crucial issue is how to learn the deep model in an unsupervised learning framework. In this thesis, we take monocular depth estimation as an example to study this problem through exploring the domain adaptation technique. Apart from the prediction from a single image or multiple images, we can also estimate the depth from multi-modal data, such as RGB image data coupled with 3D laser scan data. Since the 3D data is usually sparse and irregularly distributed, we are required to model the contextual information from the sparse data and fuse the multi-modal features. We examine the issues by studying the depth completion task. For 3D information understanding, such as point clouds analysis, due to the sparsity and unordered property of 3D point cloud, instead of the conventional convolution, new operations which can model the local geometric shape are required. We design a basic operation for point cloud analysis through introducing a novel adaptive edge-to-edge interaction learning module. Besides, due to the diversity in configurations of the 3D laser scanners, the captured 3D data often varies from dataset to dataset in object size, density, and viewpoints. As a result, the domain generalization in 3D data analysis is also a critical problem. We study this issue in 3D shape classification by proposing an entropy regularization term. Through studying four specific tasks, this thesis focuses on several crucial issues in deep 3D information prediction and understanding, including model designing, multi-modal fusion, sparse data analysis, unsupervised learning, domain adaptation, and domain generalization.
See less
Date
2022Rights statement
The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.Faculty/School
Faculty of Engineering, School of Computer ScienceAwarding institution
The University of SydneyShare