The application of multi-modal machine learning to identify new biomarkers for neurological diseases in large biomedical datasets
Field | Value | Language |
dc.contributor.author | Allwright, Michael | |
dc.date.accessioned | 2024-06-21T02:13:27Z | |
dc.date.available | 2024-06-21T02:13:27Z | |
dc.date.issued | 2024 | en_AU |
dc.identifier.uri | https://hdl.handle.net/2123/32697 | |
dc.description | Includes publication | |
dc.description.abstract | The human brain is the most complex structure on the planet and presents significant challenges in understanding and treating neurological diseases (NDs). Despite technological advancements, our understanding of NDs remains limited. This thesis seeks to apply machine learning (ML) to high-dimensional data to identify novel risk factors and improved prediction models for NDs. Significant technological advancements have resulted in large volumes of high-resolution, personalized, multi-modal medical data. This data enables high powered studies to be performed, to deepen our understanding of NDs. The UK Biobank (UKB) have collected multi-modal data from over 500,000 participants, helping accelerate our understanding NDs. Machine learning (ML) is a branch of artificial intelligence that enables computers to learn from data, identifying patterns and relationships to make predictions. ML is particularly well-suited to modelling the high-dimensional data from repositories like the UKB, outperforming traditional statistical methods. IDEARs applies ML (XGBoost) to the UKB to predict disease risk, and uses tree-based SHAP for feature ranking. It identifies top risk factors for NDs and provides supplementary analysis for effect size and statistical significance. ReTimeML automates the estimation of retention times for ceramide (Cer) and sphingomyelin (SM) profiles in liquid chromatography/mass spectrometry (LC-MS/MS) analysis, efficiently determining the concentration of these sphingolipid species. Key findings include the impact of IGF1 and inflammatory factors on PD risk, lifestyle and socioeconomic factors in DPN, and the significance of the APOE ε4 allele in AD, along with liver pathology as a novel risk factor. We confirm the association of anandamide (AEA) with SCZ and demonstrate the high performance of the IDEARs platform in risk profiling SCZ patients using a combination of biofluids, endocannabinoids (eCBs) and Sphingoliplids (SLs). | en_AU |
dc.language.iso | en | en_AU |
dc.subject | neurodegenerative disease | en_AU |
dc.subject | schizophrenia | en_AU |
dc.subject | machine learning | en_AU |
dc.subject | UK Biobank | en_AU |
dc.subject | multi-modal | en_AU |
dc.title | The application of multi-modal machine learning to identify new biomarkers for neurological diseases in large biomedical datasets | en_AU |
dc.type | Thesis | |
dc.type.thesis | Doctor of Philosophy | en_AU |
dc.rights.other | The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission. | en_AU |
usyd.faculty | SeS faculties schools::Faculty of Medicine and Health::School of Medical Sciences | en_AU |
usyd.degree | Doctor of Philosophy Ph.D. | en_AU |
usyd.awardinginst | The University of Sydney | en_AU |
usyd.advisor | GUENNEWIG, BORIS | |
usyd.include.pub | Yes | en_AU |
Associated file/s
Associated collections