Enhancing Novel-class Generalization of Deep Learning Models for Vision Tasks
Field | Value | Language |
dc.contributor.author | Yang, Xianghui | |
dc.date.accessioned | 2024-05-08T05:56:38Z | |
dc.date.available | 2024-05-08T05:56:38Z | |
dc.date.issued | 2024 | en_AU |
dc.identifier.uri | https://hdl.handle.net/2123/32537 | |
dc.description | Includes publication | |
dc.description.abstract | The field of artificial intelligence has witnessed remarkable progress due to the advancement of deep neural networks. However, improving the ability of deep learning models to perform well on out-of-the-distribution data remains a major challenge. This doctoral thesis focuses on addressing the issue of generalization in deep learning concerning for 2D, 2D-to-3D, and 3D tasks. For the 2D task of few-shot semantic segmentation, we propose a novel framework named BriNet with two key contributions. Firstly, we introduce an information exchange module that adeptly augments the feature representations of both support and query images, and we devise a more fine-grained way to better localize the objects in the query image. Second, we propose a new online refinement strategy to adapt the trained model to unseen test objects. Shifting from the 2D task to the 2D-to-3D task, specifically single-view 3D mesh reconstruction, we present a novel framework, GenMesh, with three strategies to improve the model generalization ability on novel classes and prevent overfitting, namely, learning intermediate point cloud representation, employing local features, and introducing multi-view silhouette loss for model regularization. In the context of the 3D task, i.e., surface reconstruction, we introduce a novel 3D representation called Neural Vector Fields (NVF). Leveraging this innovative representation, we present two frameworks that utilize cross-category information to enhance the generalization on novel classes. The first framework, NVF (lite), employs a hard codebook, serving as a precursor to progress. This is followed by NVF (ultra), which incorporates a soft codebook and introduces zero-curl and direction regularization, further enhancing generalization capabilities. Extensive experiments conducted in this thesis validate the effectiveness of the proposed methodologies in improving the generalization capacity of deep learning models across various tasks. | en_AU |
dc.language.iso | en | en_AU |
dc.subject | computer vision | en_AU |
dc.subject | deep learning | en_AU |
dc.subject | model generalization | en_AU |
dc.subject | few-shot segmentation | en_AU |
dc.subject | 3D reconstruction | en_AU |
dc.subject | novel data | en_AU |
dc.title | Enhancing Novel-class Generalization of Deep Learning Models for Vision Tasks | en_AU |
dc.type | Thesis | |
dc.type.thesis | Doctor of Philosophy | en_AU |
dc.rights.other | The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission. | en_AU |
usyd.faculty | SeS faculties schools::Faculty of Engineering::School of Electrical and Information Engineering | en_AU |
usyd.degree | Doctor of Philosophy Ph.D. | en_AU |
usyd.awardinginst | The University of Sydney | en_AU |
usyd.advisor | Zhou, Luping | |
usyd.include.pub | Yes | en_AU |
Associated file/s
Associated collections