Self-supervised Visual Representation Learning
Access status:
USyd Access
Type
ThesisThesis type
Masters by ResearchAuthor/s
Wang, ZhaoqingAbstract
In general, large-scale annotated data are essential to training deep neural networks in order to achieve better performance in visual feature learning for various computer vision applications. Unfortunately, the amount of annotations is challenging to obtain, requiring a high cost ...
See moreIn general, large-scale annotated data are essential to training deep neural networks in order to achieve better performance in visual feature learning for various computer vision applications. Unfortunately, the amount of annotations is challenging to obtain, requiring a high cost of money and human resources. The dependence on large-scale annotated data has become a crucial bottleneck in developing an advanced intelligence perception system. Self-supervised visual representation learning, a subset of unsupervised learning, has gained popularity because of its ability to avoid the high cost of annotated data. A series of methods designed various pretext tasks to explore the general representations from unlabeled data and use these general representations for different downstream tasks. Although previous methods achieved great success, the label noise problem exists in these pretext tasks due to the lack of human-annotation supervision, which causes harmful effects on the transfer performance. This thesis discusses two types of the noise problem in self-supervised learning and designs the corresponding methods to alleviate the negative effects and explore the transferable representations. Firstly, in pixel-level self-supervised learning, the pixel-level correspondences are easily noisy because of complicated context relationships (e.g., misleading pixels in the background). Secondly, two views of the same image share the foreground object and some background information. As optimizing the pretext task (e.g., contrastive learning), the model is easily to capture the foreground object and noisy background information, simultaneously. Such background information can be harmful to the transfer performance on downstream tasks, including image classification, object detection, and instance segmentation. To address the above mentioned issues, our core idea is to leverage the data regularities and prior knowledge. Experimental results demonstrate that the proposed methods effectively alleviate the negative effects of label noise in self-supervised learning and surpass a series of previous methods.
See less
See moreIn general, large-scale annotated data are essential to training deep neural networks in order to achieve better performance in visual feature learning for various computer vision applications. Unfortunately, the amount of annotations is challenging to obtain, requiring a high cost of money and human resources. The dependence on large-scale annotated data has become a crucial bottleneck in developing an advanced intelligence perception system. Self-supervised visual representation learning, a subset of unsupervised learning, has gained popularity because of its ability to avoid the high cost of annotated data. A series of methods designed various pretext tasks to explore the general representations from unlabeled data and use these general representations for different downstream tasks. Although previous methods achieved great success, the label noise problem exists in these pretext tasks due to the lack of human-annotation supervision, which causes harmful effects on the transfer performance. This thesis discusses two types of the noise problem in self-supervised learning and designs the corresponding methods to alleviate the negative effects and explore the transferable representations. Firstly, in pixel-level self-supervised learning, the pixel-level correspondences are easily noisy because of complicated context relationships (e.g., misleading pixels in the background). Secondly, two views of the same image share the foreground object and some background information. As optimizing the pretext task (e.g., contrastive learning), the model is easily to capture the foreground object and noisy background information, simultaneously. Such background information can be harmful to the transfer performance on downstream tasks, including image classification, object detection, and instance segmentation. To address the above mentioned issues, our core idea is to leverage the data regularities and prior knowledge. Experimental results demonstrate that the proposed methods effectively alleviate the negative effects of label noise in self-supervised learning and surpass a series of previous methods.
See less
Date
2022Rights statement
The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.Faculty/School
Faculty of Engineering, School of Computer ScienceAwarding institution
The University of SydneyShare