Exploring Enhanced Visual Representation Learning for Improved Generative Modeling

Yue, Xiaoyu

Permalink

Access status:

USyd Access

Type

Thesis

Thesis type

Doctor of Philosophy

Author/s

Yue, Xiaoyu

Abstract

Image generation is a pivotal research direction in computer vision due to its wide-ranging potential applications. Despite substantial advances brought by the development of generative paradigms and network architectures, existing models often fail to fully exploit the rich ...
See moreImage generation is a pivotal research direction in computer vision due to its wide-ranging potential applications. Despite substantial advances brought by the development of generative paradigms and network architectures, existing models often fail to fully exploit the rich high-level semantic structure inherent in visual data. The insufficient incorporation of such semantic information limits their ability to accurately model complex real-world distributions. In this thesis, we address this limitation by integrating visual representation learning into generative frameworks, aiming to improve both the fidelity and the semantic coherence of generated images. Our research begins by investigating the intrinsic mechanisms of image generative models to verify their ability to learn high-level visual semantics. We propose a novel generative framework with a unified self-supervised training paradigm called GUNS. It employs a diffusion decoder to integrate diverse self-supervised pre-training objectives within a single denoising diffusion model. We subsequently leverage high-level visual semantics to enhance generative models, introducing three distinct methodologies for different generative paradigms and components: (1) Jointly training semantic information within the generator. (2) Injecting semantics during sampling. (3) Building a semantically aligned latent space. Extensive experiments validate the effectiveness of these three approaches and provide systematic evidence that high-quality visual semantic representations can actively enhance image generation. This research establishes a solid foundation for unifying image understanding and generation and aims to inspire future work on developing more semantically aware and controllable generative models.
See less

Date

2025

Rights statement

The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.

Faculty/School

Faculty of Engineering, School of Electrical and Information Engineering

Awarding institution

The University of Sydney

Subjects

Deep Learning
Representation Learning
Image Generation