Show simple item record

FieldValueLanguage
dc.contributor.authorLong, Siqu
dc.date.accessioned2024-03-05T03:46:16Z
dc.date.available2024-03-05T03:46:16Z
dc.date.issued2024en_AU
dc.identifier.urihttps://hdl.handle.net/2123/32303
dc.descriptionIncludes publication
dc.description.abstractMulti-modal/-aspect data contains complementary information about the same thing of interest that has the promising potential of leading to improved model robustness and thus gaining an increasing research focus. There are two typical categories of multi-modal/-aspect problems that require crossmodal/- aspect alignment and integration: 1) heterogeneous multi-modal problems that deal with data from multiple media forms, such as text, image etc., and 2) homogeneous multi-aspect problems that handle data with different aspects represented by the same media form, such as the syntactic and semantic aspects of a textual sentence etc. However, most of the existing approaches for multimodal/- aspect simply tackle the cross-modal/-aspect alignment and integration through various deep learning neural networks in an implicit manner and optimize based on the final task goals, leaving the potential strategies for improving the cross-modal/-aspect alignment and integration under-explored. This thesis aims to initiate an exploration of strategies and approaches towards multi-modal/-aspect deep alignment and integration. By looking into the limitations of existing approaches for both heterogeneous multi-modal problems and homogeneous multi-aspect problems, it proposes novel strategies and approaches for improving the cross-modal/-aspect alignment and integration and evaluates on the most essential representative tasks. For the heterogeneous setting, a cross-modal information captured graph-structured representation learning approach is proposed to enforce better cross-modal alignment and evaluated on the Language-to-Vision and Vision-and-Language scenarios. On the other hand, for the homogeneous setting, a bi-directional and deep crossintegration mechanism is explored to synthesise the multi-level semantics for comprehensive text understanding, which is validated in the joint multi-aspect natural language understanding context and its generalised text understanding setting.en_AU
dc.language.isoenen_AU
dc.subjecttext image matchingen_AU
dc.subjecttext to image generationen_AU
dc.subjectmulti-modal learningen_AU
dc.subjectjoint intent classification and slot fillingen_AU
dc.subjectmulti-aspect learningen_AU
dc.titleToward Multi-modal Multi-aspect Deep Alignment and Integrationen_AU
dc.typeThesis
dc.type.thesisDoctor of Philosophyen_AU
dc.rights.otherThe author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.en_AU
usyd.facultySeS faculties schools::Faculty of Engineering::School of Computer Scienceen_AU
usyd.degreeDoctor of Philosophy Ph.D.en_AU
usyd.awardinginstThe University of Sydneyen_AU
usyd.advisorPoon, Josiah
usyd.include.pubYesen_AU


Show simple item record

Associated file/s

Associated collections

Show simple item record

There are no previous versions of the item available.