Visual Scene Understanding through Scene Graph Generation and Joint Learning
Field | Value | Language |
dc.contributor.author | Bui, Anh Duc | |
dc.date.accessioned | 2023-02-03T02:47:50Z | |
dc.date.available | 2023-02-03T02:47:50Z | |
dc.date.issued | 2023 | en_AU |
dc.identifier.uri | https://hdl.handle.net/2123/29954 | |
dc.description.abstract | Deep visual scene understanding is an essential part for the development of high-level visual understanding tasks such as storytelling or Visual Question Answering. One of the proposed solutions for such purposes were Scene Graphs, with the capacity to represent the semantic details of images into abstract elements using a graph structure which is both suitable for machine processing as well as human understanding. However, automatically generating reasonable and informative scene graphs remains a challenge due to the problem of long tail biases present in the annotated data available. Therefore, the goal of the thesis focuses on the generation of scene graph from images for the visual understanding in two main aspects: how scene graph can be generated with object predicates that are both reasonable with human understanding and informative enough for usage of further computer vision usage and how joint learning can be applied in the scene graph generation pipeline to further improve the quality of the output scene graph. For the first end, we addressed the problem in the scene graph generation task where uncorrelated labels are classified against each other, in which we tackled by categorising correlated labels and learning category-specific predicate features. For the second end, a shuffle transformer is proposed as a method for jointly learning the category specific features to generate a more robust and informative universal predicate feature which is used to generate better predicate labels for the scene graph. The performance of the proposed methodology is then evaluated in comparison with state-of-the-art scene graph generation methods in the fields by using mean recall metric on the subset Visual Genome which was most commonly used for scene graph generation. | en_AU |
dc.language.iso | en | en_AU |
dc.subject | scene graph | en_AU |
dc.subject | natural language processing | en_AU |
dc.subject | computer vision | en_AU |
dc.subject | machine learning | en_AU |
dc.subject | deep learning | en_AU |
dc.subject | artificial intelligence | en_AU |
dc.title | Visual Scene Understanding through Scene Graph Generation and Joint Learning | en_AU |
dc.type | Thesis | |
dc.type.thesis | Masters by Research | en_AU |
dc.rights.other | The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission. | en_AU |
usyd.faculty | SeS faculties schools::Faculty of Engineering::School of Computer Science | en_AU |
usyd.degree | Master of Philosophy M.Phil | en_AU |
usyd.awardinginst | University of Sydney | en_AU |
usyd.advisor | Poon, Josiah | |
usyd.include.pub | No | en_AU |
Associated file/s
Associated collections