Exploring Enhanced Motion Modeling Methods for Action Recognition

Zhuang, Peiqin

Access status:

Open Access

Field	Value	Language
dc.contributor.author	Zhuang, Peiqin
dc.date.accessioned	2024-05-30T03:38:51Z
dc.date.available	2024-05-30T03:38:51Z
dc.date.issued	2024	en_AU
dc.identifier.uri	https://hdl.handle.net/2123/32601
dc.description	Includes publication
dc.description.abstract	This thesis aims to address three key issues in action recognition through the enhancement of motion modeling, including handling complex motion variations, improving pseudo label quality in semi-supervised settings, and incorporating explicit motion modeling for transformers. First, we propose to capture proper motion information, since motion dynamics like moving tempos and action amplitude may vary a lot in different video clips. To this end, we introduce a Motion Diversification and Selection (MoDS) module to generate diversified spatiotemporal motion features and select the most appropriate motion representation for categorizing the input video. Second, we propose to improve pseudo label quality in semi-supervised action recognition. Previous methods only use a single network to generate pseudo labels, where a single network is limited in capturing different motion patterns simultaneously. To this end, we advocate jointly training a pair of heterogeneous networks, i.e., a 2D CNN and a 3D CNN, to characterize different specific motion patterns simultaneously. Then, we utilize the label propagation strategy within and across these networks to refine pseudo labels. Third, we propose to perform explicit motion modeling for transformers. We observe that transformer-based methods underperform on motion-sensitive datasets, indicating their limited capacity in temporal modeling. We also note that the conventional motion representation, namely cost volume, is quite similar to the affinity matrix defined in self-attention, but poses powerful motion model capacities. In this case, we propose to examine the essential properties of cost volume for effective motion modeling and integrate them into self-attention to enhance motion representation. We have conducted comprehensive experiments on widely-used datasets to confirm the effectiveness of our proposed methods. Our approaches have proven to be superior to other advanced methods under different scenarios.	en_AU
dc.language.iso	en	en_AU
dc.subject	Action Recognition	en_AU
dc.subject	Video Classification	en_AU
dc.subject	Motion Modeling	en_AU
dc.title	Exploring Enhanced Motion Modeling Methods for Action Recognition	en_AU
dc.type	Thesis
dc.type.thesis	Doctor of Philosophy	en_AU
dc.rights.other	The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.	en_AU
usyd.faculty	SeS faculties schools::Faculty of Engineering::School of Electrical and Information Engineering	en_AU
usyd.degree	Doctor of Philosophy Ph.D.	en_AU
usyd.awardinginst	The University of Sydney	en_AU
usyd.advisor	Zhou, Luping
usyd.include.pub	Yes	en_AU