Show simple item record

FieldValueLanguage
dc.contributor.authorWang, Heng
dc.date.accessioned2024-05-14T05:40:04Z
dc.date.available2024-05-14T05:40:04Z
dc.date.issued2024en
dc.identifier.urihttps://hdl.handle.net/2123/32554
dc.descriptionIncludes publication
dc.description.abstractArtificial intelligence (AI) is now a transformative technique from daily creative work to scientific discovery. With the surge in the amount of data and the development of AI-related techniques, AI-generated content (AIGC) and AI4Science are gaining more and more traction. In this thesis, we investigate the potential of deep learning based methods in cross-modal generation and neuroscience, addressing key facets of intelligent multimedia data analysis and processing. For the first part of this thesis, our focus spans two primary investigations: 3D dense captioning and visually-guided sound generation. We first investigate 3D dense captioning where objects within 3D indoor scenes are detected and described in human language. Recognizing the complexity inherent in 3D environments, we enhance the spatial understanding of our Transformer-based encoder-decoder architecture by incorporating spatiality information into the attention-based encoder. Contrary to the well-established research on vision-and-language, vision-and-audio, as a sun-rising field, has only recently received attention due to the complexity of audio signals. Particularly, we address the open-domain vision-to-audio generation task, approaching it through the synergy of foundation models (FMs). In the second part of this thesis, we employ deep learning based techniques for the challenging task of 3D single neuron segmentation in neuroscience from two perspectives - architectural optimization and efficient utilization of limited datasets through representation learning. We first design graph-based information reasoning modules to jointly consider the local appearance and the global structures. We then propose a novel voxel-wise cross-volume SimSiam representation learning strategy, improving learning performance while maintaining the overall model architecture. Such development will enable large-scale data-driven investigations in neuroscience and enhance our fundamental understanding of the human brain.en
dc.language.isoenen
dc.subjectMultimodality AIen
dc.subjectGenerative AIen
dc.subject3D Dense Captioningen
dc.subjectVision-to-Audio Generationen
dc.subject3D Single Neuron Reconstructionen
dc.subjectAIGCen
dc.titleIntelligent Multimedia Data Analysis and Processingen
dc.typeThesis
dc.type.thesisDoctor of Philosophyen
dc.rights.otherThe author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.en
usyd.facultySeS faculties schools::Faculty of Engineering::School of Computer Scienceen
usyd.degreeDoctor of Philosophy Ph.D.en
usyd.awardinginstThe University of Sydneyen
usyd.advisorCai, Weidong
usyd.include.pubYesen


Show simple item record

Associated file/s

Associated collections

Show simple item record

There are no previous versions of the item available.