Novel deep learning-based methods for improved prediction and feature-learning in high-throughput proteomic and transcriptomic data

Geddes, Thomas Andrew

Access status:

Open Access

Field	Value	Language
dc.contributor.author	Geddes, Thomas Andrew
dc.date.accessioned	2025-07-14T06:08:34Z
dc.date.available	2025-07-14T06:08:34Z
dc.date.issued	2025	en_AU
dc.identifier.uri	https://hdl.handle.net/2123/34107
dc.description.abstract	The rise of high-throughput Omics technologies has allowed researchers to measure biomolecular species of interest en masse at the sample or individual cell level. These technologies, including bulk and single cell transcriptomics, mass spectrometry (MS) proteomics, and other MS techniques capable of quantifying post-translational modifications (PTMs) of proteins, produce extremely large datasets, presenting new opportunities and challenges for data analysis. These datasets may capture complex relationships in the regulation of genes, proteins and PTMs. However, the development of sophisticated techniques is required both to extract this information and to overcome pathologies and challenges that arise. Issues such as missingness, biological noise, the curse of dimensionality, and others make these datasets non-trivial to analyse. This thesis explores different approaches to analysing high-throughput datasets, extracting useful information and addressing some of the challenges involved. Chapter 2 introduces Thunderbolt, a traditional analysis pipeline which provides tools for diagnosis and remedy of pathologies inherent to specific MS proteomics datasets; differential expression analysis; and downstream analysis tools. The chapter demonstrates a full analysis workflow to address a specific hypothesis and discusses approaches to dealing with dataset pathologies. Chapter 3 introduces scCCESS, a flexible autoencoder-based framework for improving the performance of clustering methods when applied to single-cell RNA-seq datasets by diversifying and simplifying inputs to the chosen clustering algorithm. Chapter 4 introduces ConGregatE-PPI, a predictive ensemble artificial neural network model which leverages complementary information from multiple datasets to improve prediction of protein-protein interactions in a specific biological context.	en_AU
dc.language.iso	en	en_AU
dc.subject	omics	en_AU
dc.subject	proteomics	en_AU
dc.subject	transcriptomics	en_AU
dc.subject	deep learning	en_AU
dc.subject	bioinformatics	en_AU
dc.title	Novel deep learning-based methods for improved prediction and feature-learning in high-throughput proteomic and transcriptomic data	en_AU
dc.type	Thesis
dc.type.thesis	Doctor of Philosophy	en_AU
dc.rights.other	The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.	en_AU
usyd.faculty	SeS faculties schools::Faculty of Science::School of Life and Environmental Sciences	en_AU
usyd.degree	Doctor of Philosophy Ph.D.	en_AU
usyd.awardinginst	The University of Sydney	en_AU
usyd.advisor	Burchfield, James