Modeling Fine-grained Long-range Visual Dependency for Deep Learning-based Medical Image Analysis
Access status:
Open Access
Type
ThesisThesis type
Doctor of PhilosophyAuthor/s
Meng, MingyuanAbstract
Medical image analysis has gained substantial attention among researchers and clinicians. The primary objective of medical image analysis is to exploit diagnostic and prognostic information from medical images to support clinical decision-making and personalized treatments, which ...
See moreMedical image analysis has gained substantial attention among researchers and clinicians. The primary objective of medical image analysis is to exploit diagnostic and prognostic information from medical images to support clinical decision-making and personalized treatments, which serves as a broad systematic research topic covering a wide range of medical vision tasks, including both pixel-wise and image-wise medical image prediction tasks. Convolutional Neural Networks (CNNs) were widely used in early deep learning-based methods for their ability to extract hierarchical image features via translation-invariant convolution operations. Subsequently, transformers are becoming popular in medical image analysis, as they can capture long-range visual dependency in medical images via self-attention operations with global connectivity. Moreover, other network backbones, such as Multi-layer Perceptrons (MLPs) and State Space Models (SSMs), also emerged as alternatives to transformers in modeling long-range visual dependency in medical images. This evolvement of network backbones demonstrates the importance of modeling long-range visual dependency for medical image analysis. The objective of this thesis is to further investigate the modeling of long-range visual dependency for deep learning-based medical image analysis, specifically in identifying the gaps of existing methods in modeling long-range visual dependency and hence introducing new methods to advance medical image analysis via more effective long-range visual dependency modeling. First, this thesis presents the innovative use of transformers in both pixel-wise and image-wise medical image prediction tasks. Then, this thesis turns its focus to MLPs, where a novel MLP block is introduced to capture multi-range visual dependency. Finally, this thesis presents a comprehensive empirical investigation of MLPs in various pixel-wise medical image prediction tasks.
See less
See moreMedical image analysis has gained substantial attention among researchers and clinicians. The primary objective of medical image analysis is to exploit diagnostic and prognostic information from medical images to support clinical decision-making and personalized treatments, which serves as a broad systematic research topic covering a wide range of medical vision tasks, including both pixel-wise and image-wise medical image prediction tasks. Convolutional Neural Networks (CNNs) were widely used in early deep learning-based methods for their ability to extract hierarchical image features via translation-invariant convolution operations. Subsequently, transformers are becoming popular in medical image analysis, as they can capture long-range visual dependency in medical images via self-attention operations with global connectivity. Moreover, other network backbones, such as Multi-layer Perceptrons (MLPs) and State Space Models (SSMs), also emerged as alternatives to transformers in modeling long-range visual dependency in medical images. This evolvement of network backbones demonstrates the importance of modeling long-range visual dependency for medical image analysis. The objective of this thesis is to further investigate the modeling of long-range visual dependency for deep learning-based medical image analysis, specifically in identifying the gaps of existing methods in modeling long-range visual dependency and hence introducing new methods to advance medical image analysis via more effective long-range visual dependency modeling. First, this thesis presents the innovative use of transformers in both pixel-wise and image-wise medical image prediction tasks. Then, this thesis turns its focus to MLPs, where a novel MLP block is introduced to capture multi-range visual dependency. Finally, this thesis presents a comprehensive empirical investigation of MLPs in various pixel-wise medical image prediction tasks.
See less
Date
2025Rights statement
The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.Faculty/School
Faculty of Engineering, School of Computer ScienceAwarding institution
The University of SydneyShare