Advanced Multimedia Computing for Human Pose Estimation
Access status:
USyd Access
Type
ThesisThesis type
Doctor of PhilosophyAuthor/s
Chu, Shek WaiAbstract
Human pose estimation (HPE), the art of deciphering human body language from images and videos, holds immense potential across diverse fields. Yet, accurate pose estimation remains a challenge due to complex factors like joint relationships, diverse clothing, and intricate backgrounds. ...
See moreHuman pose estimation (HPE), the art of deciphering human body language from images and videos, holds immense potential across diverse fields. Yet, accurate pose estimation remains a challenge due to complex factors like joint relationships, diverse clothing, and intricate backgrounds. This thesis tackles three critical HPE hurdles: underutilized joint dependencies, lack of simple refinement networks, and the absence of long-range modeling. Firstly, we propose an enhanced ResNet hourglass model infused with the efficiency-boosting DenseNet module. This empowers precise pose prediction while streamlining resource demands. Additionally, we unlock the hidden potential within the ground truth data by utilizing intermediate points as output augmentations, further augmenting prediction accuracy. Next, we introduce a set of straightforward iterative subnetworks. These act as a robust baseline for refinement, iteratively improving pose accuracy, particularly for challenging areas like wrists and ankles. We explore the integration of large receptive field subnetworks, like axial transformers, for even greater performance gains. Finally, we delve into the power of self-attention and Graph Neural Networks (GNNs) to capture long-range dependencies and joint relationships. We propose two distinct frameworks: Self-Attention Query Skeleton Graph Neural Network (SASGNN) and Graph Attention Networks (GAT) with guided edge features. Both effectively combine self-attention and GNNs, with GAT offering a simplified yet competitive architecture. By addressing these key challenges, the proposed methodologies not only achieve competitive HPE accuracy but also excel in predicting difficult body parts. This paves the way for more precise and robust pose estimation systems with wide-ranging applications across various domains.
See less
See moreHuman pose estimation (HPE), the art of deciphering human body language from images and videos, holds immense potential across diverse fields. Yet, accurate pose estimation remains a challenge due to complex factors like joint relationships, diverse clothing, and intricate backgrounds. This thesis tackles three critical HPE hurdles: underutilized joint dependencies, lack of simple refinement networks, and the absence of long-range modeling. Firstly, we propose an enhanced ResNet hourglass model infused with the efficiency-boosting DenseNet module. This empowers precise pose prediction while streamlining resource demands. Additionally, we unlock the hidden potential within the ground truth data by utilizing intermediate points as output augmentations, further augmenting prediction accuracy. Next, we introduce a set of straightforward iterative subnetworks. These act as a robust baseline for refinement, iteratively improving pose accuracy, particularly for challenging areas like wrists and ankles. We explore the integration of large receptive field subnetworks, like axial transformers, for even greater performance gains. Finally, we delve into the power of self-attention and Graph Neural Networks (GNNs) to capture long-range dependencies and joint relationships. We propose two distinct frameworks: Self-Attention Query Skeleton Graph Neural Network (SASGNN) and Graph Attention Networks (GAT) with guided edge features. Both effectively combine self-attention and GNNs, with GAT offering a simplified yet competitive architecture. By addressing these key challenges, the proposed methodologies not only achieve competitive HPE accuracy but also excel in predicting difficult body parts. This paves the way for more precise and robust pose estimation systems with wide-ranging applications across various domains.
See less
Date
2024Rights statement
The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.Faculty/School
Faculty of Engineering, School of Computer ScienceAwarding institution
The University of SydneyShare