Self-Supervised Intrinsic Representation Learning on Medical Images for Universal Foundation Models
| Field | Value | Language |
| dc.contributor.author | Ma, Yang | |
| dc.date.accessioned | 2026-03-03T22:36:43Z | |
| dc.date.available | 2026-03-03T22:36:43Z | |
| dc.date.issued | 2025 | en |
| dc.identifier.uri | https://hdl.handle.net/2123/34930 | |
| dc.description | Includes publication | |
| dc.description.abstract | Deep learning has advanced medical imaging, yet clinical adoption is limited by scarce annotations, weak interpretability, and poor generalization. A key limitation of existing methods is their purely data-driven nature, which overlooks intrinsic properties of medical images such as anatomical symmetry, spatial hierarchies, and disease-specific structural priors. This thesis proposes a unified framework that embeds these priors into model architectures and training objectives, improving data efficiency, robustness, and interpretability. First, I introduce a Symmetry-Aware Cross-Attention (SACA) module for brain disease diagnosis with limited supervision. By performing cross-attention between original and flipped 3D brain volumes and applying contrastive pretraining, SACA captures hemispheric asymmetries and yields improved classification and lesion segmentation on multi-center MRI datasets. Second, I develop MSFormer, a multi-scale vision–language transformer for medical visual question answering. With Multi-Scale Positional Embedding and Grouped Attention, MSFormer fuses coarse anatomical context with fine-grained lesion details and aligns visual evidence with clinical questions, achieving state-of-the-art performance on medical VQA benchmarks. Finally, I propose MSCAMA, a scale-aware multi-agent framework that unifies hierarchical visual encoding, adaptive retrieval, and evidence-grounded reasoning using a scale-aware backbone and specialized agents for report understanding, pairwise comparison, and question answering. Experiments demonstrate consistent improvements in retrieval accuracy, clinical relevance, and reasoning quality. Together, these contributions advance interpretable, generalizable, and data-efficient medical imaging AI. | en |
| dc.language.iso | en | en |
| dc.subject | Medical Imaging | en |
| dc.subject | Self-supervised Learning | en |
| dc.subject | Foundation Models | en |
| dc.title | Self-Supervised Intrinsic Representation Learning on Medical Images for Universal Foundation Models | en |
| dc.type | Thesis | |
| dc.type.thesis | Doctor of Philosophy | en |
| dc.rights.other | The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission. | en |
| usyd.faculty | SeS faculties schools::Faculty of Engineering | en |
| usyd.degree | Doctor of Philosophy Ph.D. | en |
| usyd.awardinginst | The University of Sydney | en |
| usyd.advisor | Cai, Weidong | |
| usyd.include.pub | Yes | en |
Associated file/s
Associated collections