Advanced Multimedia Computing for Human Pose Estimation

Chu, Shek Wai

Permalink

Access status:

USyd Access

Type

Thesis

Thesis type

Doctor of Philosophy

Author/s

Chu, Shek Wai

Abstract

Human pose estimation (HPE), the art of deciphering human body language from images and videos, holds immense potential across diverse fields. Yet, accurate pose estimation remains a challenge due to complex factors like joint relationships, diverse clothing, and intricate backgrounds. ...
See moreHuman pose estimation (HPE), the art of deciphering human body language from images and videos, holds immense potential across diverse fields. Yet, accurate pose estimation remains a challenge due to complex factors like joint relationships, diverse clothing, and intricate backgrounds. This thesis tackles three critical HPE hurdles: underutilized joint dependencies, lack of simple refinement networks, and the absence of long-range modeling. Firstly, we propose an enhanced ResNet hourglass model infused with the efficiency-boosting DenseNet module. This empowers precise pose prediction while streamlining resource demands. Additionally, we unlock the hidden potential within the ground truth data by utilizing intermediate points as output augmentations, further augmenting prediction accuracy. Next, we introduce a set of straightforward iterative subnetworks. These act as a robust baseline for refinement, iteratively improving pose accuracy, particularly for challenging areas like wrists and ankles. We explore the integration of large receptive field subnetworks, like axial transformers, for even greater performance gains. Finally, we delve into the power of self-attention and Graph Neural Networks (GNNs) to capture long-range dependencies and joint relationships. We propose two distinct frameworks: Self-Attention Query Skeleton Graph Neural Network (SASGNN) and Graph Attention Networks (GAT) with guided edge features. Both effectively combine self-attention and GNNs, with GAT offering a simplified yet competitive architecture. By addressing these key challenges, the proposed methodologies not only achieve competitive HPE accuracy but also excel in predicting difficult body parts. This paves the way for more precise and robust pose estimation systems with wide-ranging applications across various domains.
See less

Date

2024

Rights statement

The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.

Faculty/School

Faculty of Engineering, School of Computer Science

Awarding institution

The University of Sydney

Subjects

Advanced Multimedia Computing
Deep Learning
Computer Vision
Human Pose Estimation
Convolutional Neural Network
Graph Neural Network