Show simple item record

FieldValueLanguage
dc.contributor.authorYue, Xiaoyu
dc.date.accessioned2026-03-30T02:39:17Z
dc.date.available2026-03-30T02:39:17Z
dc.date.issued2025en
dc.identifier.urihttps://hdl.handle.net/2123/35046
dc.description.abstractImage generation is a pivotal research direction in computer vision due to its wide-ranging potential applications. Despite substantial advances brought by the development of generative paradigms and network architectures, existing models often fail to fully exploit the rich high-level semantic structure inherent in visual data. The insufficient incorporation of such semantic information limits their ability to accurately model complex real-world distributions. In this thesis, we address this limitation by integrating visual representation learning into generative frameworks, aiming to improve both the fidelity and the semantic coherence of generated images. Our research begins by investigating the intrinsic mechanisms of image generative models to verify their ability to learn high-level visual semantics. We propose a novel generative framework with a unified self-supervised training paradigm called GUNS. It employs a diffusion decoder to integrate diverse self-supervised pre-training objectives within a single denoising diffusion model. We subsequently leverage high-level visual semantics to enhance generative models, introducing three distinct methodologies for different generative paradigms and components: (1) Jointly training semantic information within the generator. (2) Injecting semantics during sampling. (3) Building a semantically aligned latent space. Extensive experiments validate the effectiveness of these three approaches and provide systematic evidence that high-quality visual semantic representations can actively enhance image generation. This research establishes a solid foundation for unifying image understanding and generation and aims to inspire future work on developing more semantically aware and controllable generative models.en
dc.language.isoenen
dc.subjectDeep Learningen
dc.subjectRepresentation Learningen
dc.subjectImage Generationen
dc.titleExploring Enhanced Visual Representation Learning for Improved Generative Modelingen
dc.typeThesis
dc.type.thesisDoctor of Philosophyen
dc.rights.otherThe author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.en
usyd.facultySeS faculties schools::Faculty of Engineering::School of Electrical and Information Engineeringen
usyd.degreeDoctor of Philosophy Ph.D.en
usyd.awardinginstThe University of Sydneyen
usyd.advisorZhou, Luping
usyd.include.pubNoen


Show simple item record

Associated file/s

Associated collections

Show simple item record

There are no previous versions of the item available.