Exploring Enhanced Visual Representation Learning for Improved Generative Modeling

Yue, Xiaoyu

Access status:

USyd Access

Field	Value	Language
dc.contributor.author	Yue, Xiaoyu
dc.date.accessioned	2026-03-30T02:39:17Z
dc.date.available	2026-03-30T02:39:17Z
dc.date.issued	2025	en
dc.identifier.uri	https://hdl.handle.net/2123/35046
dc.description.abstract	Image generation is a pivotal research direction in computer vision due to its wide-ranging potential applications. Despite substantial advances brought by the development of generative paradigms and network architectures, existing models often fail to fully exploit the rich high-level semantic structure inherent in visual data. The insufficient incorporation of such semantic information limits their ability to accurately model complex real-world distributions. In this thesis, we address this limitation by integrating visual representation learning into generative frameworks, aiming to improve both the fidelity and the semantic coherence of generated images. Our research begins by investigating the intrinsic mechanisms of image generative models to verify their ability to learn high-level visual semantics. We propose a novel generative framework with a unified self-supervised training paradigm called GUNS. It employs a diffusion decoder to integrate diverse self-supervised pre-training objectives within a single denoising diffusion model. We subsequently leverage high-level visual semantics to enhance generative models, introducing three distinct methodologies for different generative paradigms and components: (1) Jointly training semantic information within the generator. (2) Injecting semantics during sampling. (3) Building a semantically aligned latent space. Extensive experiments validate the effectiveness of these three approaches and provide systematic evidence that high-quality visual semantic representations can actively enhance image generation. This research establishes a solid foundation for unifying image understanding and generation and aims to inspire future work on developing more semantically aware and controllable generative models.	en
dc.language.iso	en	en
dc.subject	Deep Learning	en
dc.subject	Representation Learning	en
dc.subject	Image Generation	en
dc.title	Exploring Enhanced Visual Representation Learning for Improved Generative Modeling	en
dc.type	Thesis
dc.type.thesis	Doctor of Philosophy	en
dc.rights.other	The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.	en
usyd.faculty	SeS faculties schools::Faculty of Engineering::School of Electrical and Information Engineering	en
usyd.degree	Doctor of Philosophy Ph.D.	en
usyd.awardinginst	The University of Sydney	en
usyd.advisor	Zhou, Luping
usyd.include.pub	No	en