Deep Generative Modeling for Chest X-ray Interpretation and Synthesis
Access status:
Open Access
Type
ThesisThesis type
Doctor of PhilosophyAuthor/s
Yang, LingAbstract
Multimodal Generative Modeling has significantly advanced chest x-rays (CXRs) interpretation and synthesis, but key challenges remain— such as bone suppression task for CXRs diagnosis, unifying interpretive and generative tasks, expert radiologists' evaluation for unified medical ...
See moreMultimodal Generative Modeling has significantly advanced chest x-rays (CXRs) interpretation and synthesis, but key challenges remain— such as bone suppression task for CXRs diagnosis, unifying interpretive and generative tasks, expert radiologists' evaluation for unified medical model and grounding visual features effectively for diagnosis. To tackle these issues, we leverage generative models such as Vector-Quantized Generative Adversarial Network (VQGAN) and Stable Diffusion, along with large language models like mPLUG-Owl and Qwen-VL, to present four novel contributions: (i) a bone suppression framework that improves disease diagnosis by reducing the visual interference of ribs in CXRs; (ii) a unified large language model (LLM) that supports report generation, visual question answering (VQA), and image synthesis— offering an end-to-end solution for comprehensive CXRs understanding; (iii) a comprehensive evaluation combining computational metrics and radiologists’ assessments for medical LLMs. and (iv) an instruction-tuned multimodal model for accurate disease classification and visual grounding.
See less
See moreMultimodal Generative Modeling has significantly advanced chest x-rays (CXRs) interpretation and synthesis, but key challenges remain— such as bone suppression task for CXRs diagnosis, unifying interpretive and generative tasks, expert radiologists' evaluation for unified medical model and grounding visual features effectively for diagnosis. To tackle these issues, we leverage generative models such as Vector-Quantized Generative Adversarial Network (VQGAN) and Stable Diffusion, along with large language models like mPLUG-Owl and Qwen-VL, to present four novel contributions: (i) a bone suppression framework that improves disease diagnosis by reducing the visual interference of ribs in CXRs; (ii) a unified large language model (LLM) that supports report generation, visual question answering (VQA), and image synthesis— offering an end-to-end solution for comprehensive CXRs understanding; (iii) a comprehensive evaluation combining computational metrics and radiologists’ assessments for medical LLMs. and (iv) an instruction-tuned multimodal model for accurate disease classification and visual grounding.
See less
Date
2026Rights statement
The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.Faculty/School
Faculty of Engineering, School of Electrical and Information EngineeringAwarding institution
The University of SydneyShare