Show simple item record

FieldValueLanguage
dc.contributor.authorShao, Lexuan
dc.date.accessioned2026-03-24T09:00:12Z
dc.date.available2026-03-24T09:00:12Z
dc.date.issued2026en
dc.identifier.urihttps://hdl.handle.net/2123/35026
dc.description.abstractEvidence on the quality and safety of large language models (LLMs) used to support patient communication remains limited. Recent advances in AI have increased interest in tools that assist patients during transitions of care, such as after hospital discharge. However, many evaluations rely on language similarity metrics or expert judgement without considering patient preferences or safety implications. This thesis develops and evaluates a real-time question–answering (QA) system to support patients following hospital discharge. The QA system used two language models (GPT-4o and QWen) within a retrieval-augmented generation (RAG) framework and could incorporate domain-specific knowledge bases, including MIMIC-IV-Note and a synthetic clinical question–answer dataset. The system was evaluated using 111 patient questions derived from 37 discharge summaries from MIMIC-IV. Three studies examined patient preference, response safety, and language similarity metrics. In study one, patient experts ranked responses from QA system configurations and clinical expert answers based on preference and perceived empathy. AI-generated responses were frequently preferred, particularly when RAG and clinical question datasets were included. In study two, clinical experts assessed the likelihood and severity of safety issues. Unsafe responses were relatively rare and comparable between AI-generated and clinician answers. In study three, language similarity metrics (BLEU, ROUGE, and BERTScore) showed no correlation with patient preference or safety outcomes. These findings suggest that QA systems using discharge information can produce responses acceptable to patients and generally safe under certain configurations. The results highlight limitations of standard language metrics and demonstrate the value of structured safety evaluation. Future systems may benefit from recognising question intent and routing queries to configurations optimised for retrieval, safety, or explanation.en
dc.language.isoenen
dc.rightsThe author retains copyright of this thesis
dc.subjectLarge Language Modelsen
dc.subjectRetrieval-Augmented Generationen
dc.subjectPatient Communicationen
dc.subjectClinical Safetyen
dc.subjectQuestion Answering Systemsen
dc.subjectHospital Discharge Communicationen
dc.titleEvaluating the Quality and Safety of Retrieval-Augmented Large Language Models for a Post-Discharge Patient Question Answering Systemen
dc.typeThesis
dc.type.thesisMasters by Researchen
dc.rights.otherThe author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.en
usyd.facultySeS faculties schools::Faculty of Medicine and Health::The University of Sydney School of Public Healthen
usyd.degreeMaster of Philosophy M.Philen
usyd.awardinginstThe University of Sydneyen
usyd.advisorDunn, Adam
usyd.include.pubNoen


Show simple item record

Associated file/s

Associated collections

Show simple item record

There are no previous versions of the item available.