Show simple item record

FieldValueLanguage
dc.contributor.authorWang, Zhaoqing
dc.date.accessioned2022-09-30T03:11:38Z
dc.date.available2022-09-30T03:11:38Z
dc.date.issued2022en_AU
dc.identifier.urihttps://hdl.handle.net/2123/29595
dc.description.abstractIn general, large-scale annotated data are essential to training deep neural networks in order to achieve better performance in visual feature learning for various computer vision applications. Unfortunately, the amount of annotations is challenging to obtain, requiring a high cost of money and human resources. The dependence on large-scale annotated data has become a crucial bottleneck in developing an advanced intelligence perception system. Self-supervised visual representation learning, a subset of unsupervised learning, has gained popularity because of its ability to avoid the high cost of annotated data. A series of methods designed various pretext tasks to explore the general representations from unlabeled data and use these general representations for different downstream tasks. Although previous methods achieved great success, the label noise problem exists in these pretext tasks due to the lack of human-annotation supervision, which causes harmful effects on the transfer performance. This thesis discusses two types of the noise problem in self-supervised learning and designs the corresponding methods to alleviate the negative effects and explore the transferable representations. Firstly, in pixel-level self-supervised learning, the pixel-level correspondences are easily noisy because of complicated context relationships (e.g., misleading pixels in the background). Secondly, two views of the same image share the foreground object and some background information. As optimizing the pretext task (e.g., contrastive learning), the model is easily to capture the foreground object and noisy background information, simultaneously. Such background information can be harmful to the transfer performance on downstream tasks, including image classification, object detection, and instance segmentation. To address the above mentioned issues, our core idea is to leverage the data regularities and prior knowledge. Experimental results demonstrate that the proposed methods effectively alleviate the negative effects of label noise in self-supervised learning and surpass a series of previous methods.en_AU
dc.language.isoenen_AU
dc.subjectSelf-supervised Learningen_AU
dc.subjectComputer Visionen_AU
dc.subjectDense Predictionen_AU
dc.titleSelf-supervised Visual Representation Learningen_AU
dc.typeThesis
dc.type.thesisMasters by Researchen_AU
dc.rights.otherThe author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.en_AU
usyd.facultySeS faculties schools::Faculty of Engineering::School of Computer Scienceen_AU
usyd.degreeMaster of Philosophy M.Philen_AU
usyd.awardinginstThe University of Sydneyen_AU
usyd.advisorLIU, TONGLIANG
usyd.advisorBAO, WEI


Show simple item record

Associated file/s

Associated collections

Show simple item record

There are no previous versions of the item available.