Data-efficient Visual Recognition and Localization

Wei, Fangyun

Access status:

Open Access

Field	Value	Language
dc.contributor.author	Wei, Fangyun
dc.date.accessioned	2026-03-29T21:33:51Z
dc.date.available	2026-03-29T21:33:51Z
dc.date.issued	2026	en
dc.identifier.uri	https://hdl.handle.net/2123/35039
dc.description.abstract	Over the past decade, computer vision has progressed from visual understanding to visual generation. Visual understanding extracts semantic and geometric information from images and videos and supports applications such as autonomous driving, robotics, medical imaging, surveillance, and augmented reality. Object detection and segmentation are foundational visual understanding tasks. Detection localizes and classifies objects, while segmentation provides pixel-level predictions, including semantic and instance segmentation. Extending these tasks to videos introduces challenges such as motion blur, occlusion, appearance variation, and temporal consistency. In particular, video instance segmentation requires detecting, segmenting, and tracking objects across frames. These tasks provide structured representations that serve as reusable building blocks for downstream reasoning and decision-making. Despite advances in model capacity, modern vision systems rely heavily on large-scale labeled data. However, acquiring high-quality annotations is costly and labor-intensive, especially for dense video masks or domain-specific scenarios. Long-tail distributions and domain shifts further increase data collection difficulty. To address this limitation, data-efficient learning aims to achieve strong performance with limited labeled data by leveraging unlabeled data through semi-supervised learning, pseudo-labeling, consistency regularization, and pretraining. In this thesis, we study data-efficient learning for visual understanding and validate its effectiveness on object detection, video object segmentation, and video instance segmentation. We develop practical frameworks that reduce annotation dependency while maintaining competitive accuracy, advancing scalable vision systems under realistic data constraints.	en
dc.language.iso	en	en
dc.subject	data-efficient learning	en
dc.subject	computer vision	en
dc.subject	object detection	en
dc.subject	video instance segmentation	en
dc.subject	video object segmentation	en
dc.title	Data-efficient Visual Recognition and Localization	en
dc.type	Thesis
dc.type.thesis	Doctor of Philosophy	en
dc.rights.other	The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.	en
usyd.faculty	SeS faculties schools::Faculty of Engineering::School of Computer Science	en
usyd.degree	Doctor of Philosophy Ph.D.	en
usyd.awardinginst	The University of Sydney	en
usyd.advisor	Xu, Chang
usyd.include.pub	No	en