Enhancing Novel-class Generalization of Deep Learning Models for Vision Tasks

Yang, Xianghui

Access status:

Open Access

Field	Value	Language
dc.contributor.author	Yang, Xianghui
dc.date.accessioned	2024-05-08T05:56:38Z
dc.date.available	2024-05-08T05:56:38Z
dc.date.issued	2024	en_AU
dc.identifier.uri	https://hdl.handle.net/2123/32537
dc.description	Includes publication
dc.description.abstract	The field of artificial intelligence has witnessed remarkable progress due to the advancement of deep neural networks. However, improving the ability of deep learning models to perform well on out-of-the-distribution data remains a major challenge. This doctoral thesis focuses on addressing the issue of generalization in deep learning concerning for 2D, 2D-to-3D, and 3D tasks. For the 2D task of few-shot semantic segmentation, we propose a novel framework named BriNet with two key contributions. Firstly, we introduce an information exchange module that adeptly augments the feature representations of both support and query images, and we devise a more fine-grained way to better localize the objects in the query image. Second, we propose a new online refinement strategy to adapt the trained model to unseen test objects. Shifting from the 2D task to the 2D-to-3D task, specifically single-view 3D mesh reconstruction, we present a novel framework, GenMesh, with three strategies to improve the model generalization ability on novel classes and prevent overfitting, namely, learning intermediate point cloud representation, employing local features, and introducing multi-view silhouette loss for model regularization. In the context of the 3D task, i.e., surface reconstruction, we introduce a novel 3D representation called Neural Vector Fields (NVF). Leveraging this innovative representation, we present two frameworks that utilize cross-category information to enhance the generalization on novel classes. The first framework, NVF (lite), employs a hard codebook, serving as a precursor to progress. This is followed by NVF (ultra), which incorporates a soft codebook and introduces zero-curl and direction regularization, further enhancing generalization capabilities. Extensive experiments conducted in this thesis validate the effectiveness of the proposed methodologies in improving the generalization capacity of deep learning models across various tasks.	en_AU
dc.language.iso	en	en_AU
dc.subject	computer vision	en_AU
dc.subject	deep learning	en_AU
dc.subject	model generalization	en_AU
dc.subject	few-shot segmentation	en_AU
dc.subject	3D reconstruction	en_AU
dc.subject	novel data	en_AU
dc.title	Enhancing Novel-class Generalization of Deep Learning Models for Vision Tasks	en_AU
dc.type	Thesis
dc.type.thesis	Doctor of Philosophy	en_AU
dc.rights.other	The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.	en_AU
usyd.faculty	SeS faculties schools::Faculty of Engineering::School of Electrical and Information Engineering	en_AU
usyd.degree	Doctor of Philosophy Ph.D.	en_AU
usyd.awardinginst	The University of Sydney	en_AU
usyd.advisor	Zhou, Luping
usyd.include.pub	Yes	en_AU