Multi-Modality Fusion Convolutional Network (MM-FCN) for Prediction of Retinal Vein Occlusion using Swept Source Optical Coherence Tomography Angiography

Huo, Xinyu

Permalink

Access status:

Open Access

Type

Thesis

Thesis type

Doctor of Philosophy

Author/s

Huo, Xinyu

Abstract

Retinal vein occlusion (RVO) is a complex vascular disorder and a major cause of vision loss. Inadequate assessment may result in the expansion of non-perfusion areas (NPAs), leading to severe complications, underscoring the need for accurate prognosis. Swept-source optical coherence ...
See moreRetinal vein occlusion (RVO) is a complex vascular disorder and a major cause of vision loss. Inadequate assessment may result in the expansion of non-perfusion areas (NPAs), leading to severe complications, underscoring the need for accurate prognosis. Swept-source optical coherence tomography angiography (SS-OCTA) enables multimodal imaging by combining B-scans, which provide cross-sectional structural details, with Angio-flow scans, which capture en-face microvascular blood flow. Despite its advantages in resolution and depth, SS-OCTA has been less explored in RVO compared to other retinal diseases, partly due to its lower prevalence and the interpretive challenges of multimodal data. This study investigates both traditional machine learning and advanced deep learning strategies for SS-OCTA–based prognosis prediction in RVO. A radiomics pipeline with conventional classifiers is first developed and compared with ConvNeXt-based networks on 2D and 3D data. To enhance multimodal learning, a 3D multi-modality alternating dynamic fusion ConvNeXt (mmDFC) is proposed, which alternately trains B-scan and Angio-flow inputs to mitigate intermodal interference and dynamically fuse structural and vascular features. A 2D mmDFC variant is also examined. Furthermore, a 2D multi-modality Dual-Branch Correlation-driven Fusion ConvNeXt (mmDCFC) is introduced, employing convolutional and Transformer-based encoders to separately extract modality-specific and cross-modal features, achieving superior performance among 2D models. Extensive experiments on SS-OCTA datasets demonstrate that both 2D and 3D multimodal approaches outperform single-modality and radiomics baselines. In particular, the 3D mmDFC achieved the best overall performance, highlighting the importance of multimodal dynamic fusion and alternating training for accurate RVO prognosis classification.
See less

Date

2025

Rights statement

The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.

Faculty/School

Faculty of Engineering, School of Civil Engineering

Awarding institution

The University of Sydney

Subjects

Multi-modality learning
deep learning
Radiomics
Retinal Vessel Occlusion
machine learning