Novel deep learning-based methods for improved prediction and feature-learning in high-throughput proteomic and transcriptomic data

Geddes, Thomas Andrew

Permalink

Access status:

Open Access

Type

Thesis

Thesis type

Doctor of Philosophy

Author/s

Geddes, Thomas Andrew

Abstract

The rise of high-throughput Omics technologies has allowed researchers to measure biomolecular species of interest en masse at the sample or individual cell level. These technologies, including bulk and single cell transcriptomics, mass spectrometry (MS) proteomics, and other MS ...
See moreThe rise of high-throughput Omics technologies has allowed researchers to measure biomolecular species of interest en masse at the sample or individual cell level. These technologies, including bulk and single cell transcriptomics, mass spectrometry (MS) proteomics, and other MS techniques capable of quantifying post-translational modifications (PTMs) of proteins, produce extremely large datasets, presenting new opportunities and challenges for data analysis. These datasets may capture complex relationships in the regulation of genes, proteins and PTMs. However, the development of sophisticated techniques is required both to extract this information and to overcome pathologies and challenges that arise. Issues such as missingness, biological noise, the curse of dimensionality, and others make these datasets non-trivial to analyse. This thesis explores different approaches to analysing high-throughput datasets, extracting useful information and addressing some of the challenges involved. Chapter 2 introduces Thunderbolt, a traditional analysis pipeline which provides tools for diagnosis and remedy of pathologies inherent to specific MS proteomics datasets; differential expression analysis; and downstream analysis tools. The chapter demonstrates a full analysis workflow to address a specific hypothesis and discusses approaches to dealing with dataset pathologies. Chapter 3 introduces scCCESS, a flexible autoencoder-based framework for improving the performance of clustering methods when applied to single-cell RNA-seq datasets by diversifying and simplifying inputs to the chosen clustering algorithm. Chapter 4 introduces ConGregatE-PPI, a predictive ensemble artificial neural network model which leverages complementary information from multiple datasets to improve prediction of protein-protein interactions in a specific biological context.
See less

Date

2025

Rights statement

The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.

Faculty/School

Faculty of Science, School of Life and Environmental Sciences

Awarding institution

The University of Sydney

Subjects

omics
proteomics
transcriptomics
deep learning
bioinformatics