Show simple item record

FieldValueLanguage
dc.contributor.authorWei, Fangyun
dc.date.accessioned2026-03-29T21:33:51Z
dc.date.available2026-03-29T21:33:51Z
dc.date.issued2026en
dc.identifier.urihttps://hdl.handle.net/2123/35039
dc.description.abstractOver the past decade, computer vision has progressed from visual understanding to visual generation. Visual understanding extracts semantic and geometric information from images and videos and supports applications such as autonomous driving, robotics, medical imaging, surveillance, and augmented reality. Object detection and segmentation are foundational visual understanding tasks. Detection localizes and classifies objects, while segmentation provides pixel-level predictions, including semantic and instance segmentation. Extending these tasks to videos introduces challenges such as motion blur, occlusion, appearance variation, and temporal consistency. In particular, video instance segmentation requires detecting, segmenting, and tracking objects across frames. These tasks provide structured representations that serve as reusable building blocks for downstream reasoning and decision-making. Despite advances in model capacity, modern vision systems rely heavily on large-scale labeled data. However, acquiring high-quality annotations is costly and labor-intensive, especially for dense video masks or domain-specific scenarios. Long-tail distributions and domain shifts further increase data collection difficulty. To address this limitation, data-efficient learning aims to achieve strong performance with limited labeled data by leveraging unlabeled data through semi-supervised learning, pseudo-labeling, consistency regularization, and pretraining. In this thesis, we study data-efficient learning for visual understanding and validate its effectiveness on object detection, video object segmentation, and video instance segmentation. We develop practical frameworks that reduce annotation dependency while maintaining competitive accuracy, advancing scalable vision systems under realistic data constraints.en
dc.language.isoenen
dc.subjectdata-efficient learningen
dc.subjectcomputer visionen
dc.subjectobject detectionen
dc.subjectvideo instance segmentationen
dc.subjectvideo object segmentationen
dc.titleData-efficient Visual Recognition and Localizationen
dc.typeThesis
dc.type.thesisDoctor of Philosophyen
dc.rights.otherThe author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.en
usyd.facultySeS faculties schools::Faculty of Engineering::School of Computer Scienceen
usyd.degreeDoctor of Philosophy Ph.D.en
usyd.awardinginstThe University of Sydneyen
usyd.advisorXu, Chang
usyd.include.pubNoen


Show simple item record

Associated file/s

Associated collections

Show simple item record

There are no previous versions of the item available.