Enhancing Novel-class Generalization of Deep Learning Models for Vision Tasks

Yang, Xianghui

Permalink

Access status:

Open Access

Type

Thesis

Thesis type

Doctor of Philosophy

Author/s

Yang, Xianghui

Abstract

The field of artificial intelligence has witnessed remarkable progress due to the advancement of deep neural networks. However, improving the ability of deep learning models to perform well on out-of-the-distribution data remains a major challenge. This doctoral thesis focuses on ...
See moreThe field of artificial intelligence has witnessed remarkable progress due to the advancement of deep neural networks. However, improving the ability of deep learning models to perform well on out-of-the-distribution data remains a major challenge. This doctoral thesis focuses on addressing the issue of generalization in deep learning concerning for 2D, 2D-to-3D, and 3D tasks. For the 2D task of few-shot semantic segmentation, we propose a novel framework named BriNet with two key contributions. Firstly, we introduce an information exchange module that adeptly augments the feature representations of both support and query images, and we devise a more fine-grained way to better localize the objects in the query image. Second, we propose a new online refinement strategy to adapt the trained model to unseen test objects. Shifting from the 2D task to the 2D-to-3D task, specifically single-view 3D mesh reconstruction, we present a novel framework, GenMesh, with three strategies to improve the model generalization ability on novel classes and prevent overfitting, namely, learning intermediate point cloud representation, employing local features, and introducing multi-view silhouette loss for model regularization. In the context of the 3D task, i.e., surface reconstruction, we introduce a novel 3D representation called Neural Vector Fields (NVF). Leveraging this innovative representation, we present two frameworks that utilize cross-category information to enhance the generalization on novel classes. The first framework, NVF (lite), employs a hard codebook, serving as a precursor to progress. This is followed by NVF (ultra), which incorporates a soft codebook and introduces zero-curl and direction regularization, further enhancing generalization capabilities. Extensive experiments conducted in this thesis validate the effectiveness of the proposed methodologies in improving the generalization capacity of deep learning models across various tasks.
See less

Date

2024

Rights statement

The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.

Faculty/School

Faculty of Engineering, School of Electrical and Information Engineering

Awarding institution

The University of Sydney

Subjects

computer vision
deep learning
model generalization
few-shot segmentation
3D reconstruction
novel data