Advanced Multimedia Computing for Human Pose Estimation

Chu, Shek Wai

Access status:

USyd Access

Field	Value	Language
dc.contributor.author	Chu, Shek Wai
dc.date.accessioned	2024-02-27T04:13:53Z
dc.date.available	2024-02-27T04:13:53Z
dc.date.issued	2024	en
dc.identifier.uri	https://hdl.handle.net/2123/32264
dc.description	Includes publication
dc.description.abstract	Human pose estimation (HPE), the art of deciphering human body language from images and videos, holds immense potential across diverse fields. Yet, accurate pose estimation remains a challenge due to complex factors like joint relationships, diverse clothing, and intricate backgrounds. This thesis tackles three critical HPE hurdles: underutilized joint dependencies, lack of simple refinement networks, and the absence of long-range modeling. Firstly, we propose an enhanced ResNet hourglass model infused with the efficiency-boosting DenseNet module. This empowers precise pose prediction while streamlining resource demands. Additionally, we unlock the hidden potential within the ground truth data by utilizing intermediate points as output augmentations, further augmenting prediction accuracy. Next, we introduce a set of straightforward iterative subnetworks. These act as a robust baseline for refinement, iteratively improving pose accuracy, particularly for challenging areas like wrists and ankles. We explore the integration of large receptive field subnetworks, like axial transformers, for even greater performance gains. Finally, we delve into the power of self-attention and Graph Neural Networks (GNNs) to capture long-range dependencies and joint relationships. We propose two distinct frameworks: Self-Attention Query Skeleton Graph Neural Network (SASGNN) and Graph Attention Networks (GAT) with guided edge features. Both effectively combine self-attention and GNNs, with GAT offering a simplified yet competitive architecture. By addressing these key challenges, the proposed methodologies not only achieve competitive HPE accuracy but also excel in predicting difficult body parts. This paves the way for more precise and robust pose estimation systems with wide-ranging applications across various domains.	en
dc.language.iso	en	en
dc.subject	Advanced Multimedia Computing	en
dc.subject	Deep Learning	en
dc.subject	Computer Vision	en
dc.subject	Human Pose Estimation	en
dc.subject	Convolutional Neural Network	en
dc.subject	Graph Neural Network	en
dc.title	Advanced Multimedia Computing for Human Pose Estimation	en
dc.type	Thesis
dc.type.thesis	Doctor of Philosophy	en
dc.rights.other	The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.	en
usyd.faculty	SeS faculties schools::Faculty of Engineering::School of Computer Science	en
usyd.degree	Doctor of Philosophy Ph.D.	en
usyd.awardinginst	The University of Sydney	en
usyd.advisor	Cai, Weidong
usyd.include.pub	Yes	en