Show simple item record

FieldValueLanguage
dc.contributor.authorBui, Anh Duc
dc.date.accessioned2023-02-03T02:47:50Z
dc.date.available2023-02-03T02:47:50Z
dc.date.issued2023en_AU
dc.identifier.urihttps://hdl.handle.net/2123/29954
dc.description.abstractDeep visual scene understanding is an essential part for the development of high-level visual understanding tasks such as storytelling or Visual Question Answering. One of the proposed solutions for such purposes were Scene Graphs, with the capacity to represent the semantic details of images into abstract elements using a graph structure which is both suitable for machine processing as well as human understanding. However, automatically generating reasonable and informative scene graphs remains a challenge due to the problem of long tail biases present in the annotated data available. Therefore, the goal of the thesis focuses on the generation of scene graph from images for the visual understanding in two main aspects: how scene graph can be generated with object predicates that are both reasonable with human understanding and informative enough for usage of further computer vision usage and how joint learning can be applied in the scene graph generation pipeline to further improve the quality of the output scene graph. For the first end, we addressed the problem in the scene graph generation task where uncorrelated labels are classified against each other, in which we tackled by categorising correlated labels and learning category-specific predicate features. For the second end, a shuffle transformer is proposed as a method for jointly learning the category specific features to generate a more robust and informative universal predicate feature which is used to generate better predicate labels for the scene graph. The performance of the proposed methodology is then evaluated in comparison with state-of-the-art scene graph generation methods in the fields by using mean recall metric on the subset Visual Genome which was most commonly used for scene graph generation.en_AU
dc.language.isoenen_AU
dc.subjectscene graphen_AU
dc.subjectnatural language processingen_AU
dc.subjectcomputer visionen_AU
dc.subjectmachine learningen_AU
dc.subjectdeep learningen_AU
dc.subjectartificial intelligenceen_AU
dc.titleVisual Scene Understanding through Scene Graph Generation and Joint Learningen_AU
dc.typeThesis
dc.type.thesisMasters by Researchen_AU
dc.rights.otherThe author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.en_AU
usyd.facultySeS faculties schools::Faculty of Engineering::School of Computer Scienceen_AU
usyd.degreeMaster of Philosophy M.Philen_AU
usyd.awardinginstUniversity of Sydneyen_AU
usyd.advisorPoon, Josiah
usyd.include.pubNoen_AU


Show simple item record

Associated file/s

Associated collections

Show simple item record

There are no previous versions of the item available.