Visual Scene Understanding through Scene Graph Generation and Joint Learning

Bui, Anh Duc

Access status:

Open Access

Field	Value	Language
dc.contributor.author	Bui, Anh Duc
dc.date.accessioned	2023-02-03T02:47:50Z
dc.date.available	2023-02-03T02:47:50Z
dc.date.issued	2023	en_AU
dc.identifier.uri	https://hdl.handle.net/2123/29954
dc.description.abstract	Deep visual scene understanding is an essential part for the development of high-level visual understanding tasks such as storytelling or Visual Question Answering. One of the proposed solutions for such purposes were Scene Graphs, with the capacity to represent the semantic details of images into abstract elements using a graph structure which is both suitable for machine processing as well as human understanding. However, automatically generating reasonable and informative scene graphs remains a challenge due to the problem of long tail biases present in the annotated data available. Therefore, the goal of the thesis focuses on the generation of scene graph from images for the visual understanding in two main aspects: how scene graph can be generated with object predicates that are both reasonable with human understanding and informative enough for usage of further computer vision usage and how joint learning can be applied in the scene graph generation pipeline to further improve the quality of the output scene graph. For the first end, we addressed the problem in the scene graph generation task where uncorrelated labels are classified against each other, in which we tackled by categorising correlated labels and learning category-specific predicate features. For the second end, a shuffle transformer is proposed as a method for jointly learning the category specific features to generate a more robust and informative universal predicate feature which is used to generate better predicate labels for the scene graph. The performance of the proposed methodology is then evaluated in comparison with state-of-the-art scene graph generation methods in the fields by using mean recall metric on the subset Visual Genome which was most commonly used for scene graph generation.	en_AU
dc.language.iso	en	en_AU
dc.subject	scene graph	en_AU
dc.subject	natural language processing	en_AU
dc.subject	computer vision	en_AU
dc.subject	machine learning	en_AU
dc.subject	deep learning	en_AU
dc.subject	artificial intelligence	en_AU
dc.title	Visual Scene Understanding through Scene Graph Generation and Joint Learning	en_AU
dc.type	Thesis
dc.type.thesis	Masters by Research	en_AU
dc.rights.other	The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.	en_AU
usyd.faculty	SeS faculties schools::Faculty of Engineering::School of Computer Science	en_AU
usyd.degree	Master of Philosophy M.Phil	en_AU
usyd.awardinginst	University of Sydney	en_AU
usyd.advisor	Poon, Josiah
usyd.include.pub	No	en_AU