Show simple item record

FieldValueLanguage
dc.contributor.authorChu, Shek Wai
dc.date.accessioned2024-02-27T04:13:53Z
dc.date.available2024-02-27T04:13:53Z
dc.date.issued2024en
dc.identifier.urihttps://hdl.handle.net/2123/32264
dc.descriptionIncludes publication
dc.description.abstractHuman pose estimation (HPE), the art of deciphering human body language from images and videos, holds immense potential across diverse fields. Yet, accurate pose estimation remains a challenge due to complex factors like joint relationships, diverse clothing, and intricate backgrounds. This thesis tackles three critical HPE hurdles: underutilized joint dependencies, lack of simple refinement networks, and the absence of long-range modeling. Firstly, we propose an enhanced ResNet hourglass model infused with the efficiency-boosting DenseNet module. This empowers precise pose prediction while streamlining resource demands. Additionally, we unlock the hidden potential within the ground truth data by utilizing intermediate points as output augmentations, further augmenting prediction accuracy. Next, we introduce a set of straightforward iterative subnetworks. These act as a robust baseline for refinement, iteratively improving pose accuracy, particularly for challenging areas like wrists and ankles. We explore the integration of large receptive field subnetworks, like axial transformers, for even greater performance gains. Finally, we delve into the power of self-attention and Graph Neural Networks (GNNs) to capture long-range dependencies and joint relationships. We propose two distinct frameworks: Self-Attention Query Skeleton Graph Neural Network (SASGNN) and Graph Attention Networks (GAT) with guided edge features. Both effectively combine self-attention and GNNs, with GAT offering a simplified yet competitive architecture. By addressing these key challenges, the proposed methodologies not only achieve competitive HPE accuracy but also excel in predicting difficult body parts. This paves the way for more precise and robust pose estimation systems with wide-ranging applications across various domains.en
dc.language.isoenen
dc.subjectAdvanced Multimedia Computingen
dc.subjectDeep Learningen
dc.subjectComputer Visionen
dc.subjectHuman Pose Estimationen
dc.subjectConvolutional Neural Networken
dc.subjectGraph Neural Networken
dc.titleAdvanced Multimedia Computing for Human Pose Estimationen
dc.typeThesis
dc.type.thesisDoctor of Philosophyen
dc.rights.otherThe author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.en
usyd.facultySeS faculties schools::Faculty of Engineering::School of Computer Scienceen
usyd.degreeDoctor of Philosophy Ph.D.en
usyd.awardinginstThe University of Sydneyen
usyd.advisorCai, Weidong
usyd.include.pubYesen


Show simple item record

Associated file/s

Associated collections

Show simple item record

There are no previous versions of the item available.