Longitudinal Chest X-ray Image Generation via Autoregression Model and Diffusion-based Model
Access status:
Open Access
Type
ThesisThesis type
Masters by ResearchAuthor/s
Wang, YiranAbstract
Longitudinal chest X-ray (CXR) analysis is central to clinical follow-up of pulmonary diseases, where radiologists compare prior and current scans to assess subtle lesion changes while preserving thoracic anatomy. This thesis studies longitudinal CXR generation: synthesizing a ...
See moreLongitudinal chest X-ray (CXR) analysis is central to clinical follow-up of pulmonary diseases, where radiologists compare prior and current scans to assess subtle lesion changes while preserving thoracic anatomy. This thesis studies longitudinal CXR generation: synthesizing a follow-up radiograph from a reference CXR and a textual description of disease progression. Although transformer-based generators model global context effectively, I observe a corner/edge bias in attention, where attention drifts toward image boundaries. This is problematic because meaningful changes are often small and spatially localized. To address this, I propose Gaussian-Biased Causal Attention (GBCA), a lightweight module that injects lesion-centric Gaussian spatial priors into selected transformer layers to reduce boundary bias and improve lesion-aligned control. GBCA is architecture-agnostic and can be integrated into both autoregressive and diffusion transformer backbones without changing base parameters. In the autoregressive setting, I incorporate GBCA into a decoder-only multimodal transformer and validate its generality on a second autoregressive editing backbone by freezing the generator and training only GBCA. In the diffusion setting, I extend GBCA to DiT-based longitudinal generation by identifying vital layers for prior injection to better balance global structure preservation and local lesion editing. Experiments on longitudinal CXR datasets show that GBCA consistently improves image fidelity and clinical faithfulness. I also introduce lesion-aware metrics, including Attn-IoU, Corner Bias Index, and Edge Activation Ratio, to quantify attention to relevant regions. Results show that GBCA improves lesion-aligned attention, reduces corner/edge bias, and generates more anatomically consistent follow-up CXRs with better alignment to progression text. Overall, this thesis provides a practical mechanism for spatially grounded control in transformer-based medical image generation.
See less
See moreLongitudinal chest X-ray (CXR) analysis is central to clinical follow-up of pulmonary diseases, where radiologists compare prior and current scans to assess subtle lesion changes while preserving thoracic anatomy. This thesis studies longitudinal CXR generation: synthesizing a follow-up radiograph from a reference CXR and a textual description of disease progression. Although transformer-based generators model global context effectively, I observe a corner/edge bias in attention, where attention drifts toward image boundaries. This is problematic because meaningful changes are often small and spatially localized. To address this, I propose Gaussian-Biased Causal Attention (GBCA), a lightweight module that injects lesion-centric Gaussian spatial priors into selected transformer layers to reduce boundary bias and improve lesion-aligned control. GBCA is architecture-agnostic and can be integrated into both autoregressive and diffusion transformer backbones without changing base parameters. In the autoregressive setting, I incorporate GBCA into a decoder-only multimodal transformer and validate its generality on a second autoregressive editing backbone by freezing the generator and training only GBCA. In the diffusion setting, I extend GBCA to DiT-based longitudinal generation by identifying vital layers for prior injection to better balance global structure preservation and local lesion editing. Experiments on longitudinal CXR datasets show that GBCA consistently improves image fidelity and clinical faithfulness. I also introduce lesion-aware metrics, including Attn-IoU, Corner Bias Index, and Edge Activation Ratio, to quantify attention to relevant regions. Results show that GBCA improves lesion-aligned attention, reduces corner/edge bias, and generates more anatomically consistent follow-up CXRs with better alignment to progression text. Overall, this thesis provides a practical mechanism for spatially grounded control in transformer-based medical image generation.
See less
Date
2025Rights statement
The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.Faculty/School
Faculty of Engineering, School of Electrical and Information EngineeringAwarding institution
The University of SydneyShare