Exploring Enhanced Motion Modeling Methods for Action Recognition
Field | Value | Language |
dc.contributor.author | Zhuang, Peiqin | |
dc.date.accessioned | 2024-05-30T03:38:51Z | |
dc.date.available | 2024-05-30T03:38:51Z | |
dc.date.issued | 2024 | en_AU |
dc.identifier.uri | https://hdl.handle.net/2123/32601 | |
dc.description | Includes publication | |
dc.description.abstract | This thesis aims to address three key issues in action recognition through the enhancement of motion modeling, including handling complex motion variations, improving pseudo label quality in semi-supervised settings, and incorporating explicit motion modeling for transformers. First, we propose to capture proper motion information, since motion dynamics like moving tempos and action amplitude may vary a lot in different video clips. To this end, we introduce a Motion Diversification and Selection (MoDS) module to generate diversified spatiotemporal motion features and select the most appropriate motion representation for categorizing the input video. Second, we propose to improve pseudo label quality in semi-supervised action recognition. Previous methods only use a single network to generate pseudo labels, where a single network is limited in capturing different motion patterns simultaneously. To this end, we advocate jointly training a pair of heterogeneous networks, i.e., a 2D CNN and a 3D CNN, to characterize different specific motion patterns simultaneously. Then, we utilize the label propagation strategy within and across these networks to refine pseudo labels. Third, we propose to perform explicit motion modeling for transformers. We observe that transformer-based methods underperform on motion-sensitive datasets, indicating their limited capacity in temporal modeling. We also note that the conventional motion representation, namely cost volume, is quite similar to the affinity matrix defined in self-attention, but poses powerful motion model capacities. In this case, we propose to examine the essential properties of cost volume for effective motion modeling and integrate them into self-attention to enhance motion representation. We have conducted comprehensive experiments on widely-used datasets to confirm the effectiveness of our proposed methods. Our approaches have proven to be superior to other advanced methods under different scenarios. | en_AU |
dc.language.iso | en | en_AU |
dc.subject | Action Recognition | en_AU |
dc.subject | Video Classification | en_AU |
dc.subject | Motion Modeling | en_AU |
dc.title | Exploring Enhanced Motion Modeling Methods for Action Recognition | en_AU |
dc.type | Thesis | |
dc.type.thesis | Doctor of Philosophy | en_AU |
dc.rights.other | The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission. | en_AU |
usyd.faculty | SeS faculties schools::Faculty of Engineering::School of Electrical and Information Engineering | en_AU |
usyd.degree | Doctor of Philosophy Ph.D. | en_AU |
usyd.awardinginst | The University of Sydney | en_AU |
usyd.advisor | Zhou, Luping | |
usyd.include.pub | Yes | en_AU |
Associated file/s
Associated collections