Cognition-Aware Deep Learning Models for Saliency Detection
Access status:
USyd Access
Type
ThesisThesis type
Doctor of PhilosophyAuthor/s
Yan, KeAbstract
Saliency on an image is defined as the meaningful and attractive region corresponding to human visual perception and cognition systems. The detection of saliency is important in computer vision and is the fundamental step for many vision applications, such as image resizing, ...
See moreSaliency on an image is defined as the meaningful and attractive region corresponding to human visual perception and cognition systems. The detection of saliency is important in computer vision and is the fundamental step for many vision applications, such as image resizing, content-aware image cropping, action recognition and visual tracking. The mainstream deep learning models for saliency detection include the end-to-end framework that uses deep neural networks (DNNs) to automatically learn image features and saliency characteristics for the localization and segmentation of saliency. However, such methods face the following three challenges: Challenge 1: As DNNs tend to learn abstract and intrinsic knowledge, the detailed information (such as object boundary and shape) is inevitably missed in deeper layers of DNNs. Therefore, DNNs have limited capability of dealing with irregular shape and complicated boundary of the objects such as the tumors and anatomic structures on medical images. Challenge 2: In complex scenarios, there exist mixed texture distributions between foreground and background that impede DNNs in learning intrinsic and discriminative saliency characteristics. Therefore, DNNs do not have enough capability of dealing with such challenging images. Challenge 3: DNNs, reliant on human visual perception, are not capable of differentiating primary saliency and secondary saliency that requires the inclusion of human cognitive learning and thinking. The lack of human cognition is a current common challenge in saliency detection and other computer vision tasks, which is a big hurdle to evolve the machine intelligence to human intelligence. Some investigators conducted pioneering research work on cognition and proposed attentive modules to reflect human attention, which is a big step in cognition study. However, as such attention is the first step of human cognitive learning process, the research on cognition is in its infancy. In this thesis, we aim to explore and propose cognition-aware deep learning models that incorporate human cognition with machine intelligence step by step, to tackle the three challenges in saliency detection. The contributions of this thesis are summarized as follows. 1. To address Challenge 1, we used superpixels as prior-knowledge that encodes boundary information of objects and transferred the prior-knowledge to the working memory of DNNs. Such additional prior-knowledge complements the previously learned knowledge of DNNs, guiding DNNs in the segmentation of saliency. 2. To address Challenge 2, consisting to human acquiring knowledge from multiple sources for appropriate decision making, our DNNs also learn saliency knowledge from different sources (such as sparse and dense labeling schemes). Therefore, our DNNs are not limited by a specific knowledge but they could retrieve saliency from more complicated scenarios using the multiple-source knowledge. 3. To address Challenge 3, we innovatively propose to mimic and embody the process of human’s cognitive thinking of images. Consisting to the natural process of saliency detection by human, our DNNs progressively learn and encode saliency knowledge as working memory in three phases (‘Seeing’ - ‘Perceiving’ - ‘Cogitating), which is a higher-level learning on the top of the existing attentive learning.
See less
See moreSaliency on an image is defined as the meaningful and attractive region corresponding to human visual perception and cognition systems. The detection of saliency is important in computer vision and is the fundamental step for many vision applications, such as image resizing, content-aware image cropping, action recognition and visual tracking. The mainstream deep learning models for saliency detection include the end-to-end framework that uses deep neural networks (DNNs) to automatically learn image features and saliency characteristics for the localization and segmentation of saliency. However, such methods face the following three challenges: Challenge 1: As DNNs tend to learn abstract and intrinsic knowledge, the detailed information (such as object boundary and shape) is inevitably missed in deeper layers of DNNs. Therefore, DNNs have limited capability of dealing with irregular shape and complicated boundary of the objects such as the tumors and anatomic structures on medical images. Challenge 2: In complex scenarios, there exist mixed texture distributions between foreground and background that impede DNNs in learning intrinsic and discriminative saliency characteristics. Therefore, DNNs do not have enough capability of dealing with such challenging images. Challenge 3: DNNs, reliant on human visual perception, are not capable of differentiating primary saliency and secondary saliency that requires the inclusion of human cognitive learning and thinking. The lack of human cognition is a current common challenge in saliency detection and other computer vision tasks, which is a big hurdle to evolve the machine intelligence to human intelligence. Some investigators conducted pioneering research work on cognition and proposed attentive modules to reflect human attention, which is a big step in cognition study. However, as such attention is the first step of human cognitive learning process, the research on cognition is in its infancy. In this thesis, we aim to explore and propose cognition-aware deep learning models that incorporate human cognition with machine intelligence step by step, to tackle the three challenges in saliency detection. The contributions of this thesis are summarized as follows. 1. To address Challenge 1, we used superpixels as prior-knowledge that encodes boundary information of objects and transferred the prior-knowledge to the working memory of DNNs. Such additional prior-knowledge complements the previously learned knowledge of DNNs, guiding DNNs in the segmentation of saliency. 2. To address Challenge 2, consisting to human acquiring knowledge from multiple sources for appropriate decision making, our DNNs also learn saliency knowledge from different sources (such as sparse and dense labeling schemes). Therefore, our DNNs are not limited by a specific knowledge but they could retrieve saliency from more complicated scenarios using the multiple-source knowledge. 3. To address Challenge 3, we innovatively propose to mimic and embody the process of human’s cognitive thinking of images. Consisting to the natural process of saliency detection by human, our DNNs progressively learn and encode saliency knowledge as working memory in three phases (‘Seeing’ - ‘Perceiving’ - ‘Cogitating), which is a higher-level learning on the top of the existing attentive learning.
See less
Date
2020Publisher
University of SydneyRights statement
The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.Faculty/School
Faculty of Engineering, School of Computer ScienceAwarding institution
The University of SydneyShare