Novel Preprocessing Approaches for Omics Data Types and Their Performance Evaluation

Strbenac, Dario

Access status:

Open Access

Field	Value	Language
dc.contributor.author	Strbenac, Dario
dc.date.accessioned	2016-12-06
dc.date.available	2016-12-06
dc.date.issued	2016-04-08
dc.identifier.uri	http://hdl.handle.net/2123/16007
dc.description.abstract	A diverse range of high-dimensional datasets has recently become available to help elucidate the functioning of biological systems and defects within those systems leading to disease. All of these new technologies come with the challenges of determining how the raw data should be efficiently processed or normalised and, subsequently, how can the data best be summarised for more complex downstream analysis. There are many approaches to summarising and normalising omics data, with new methods frequently being developed. To date, there has not been a comprehensive evaluation of existing methods for many omics data types. This thesis focusses on systematically evaluating existing methods for three different types of omics data and, having identified limitations in the current methods, also proposes new approaches to improve their quality. Firstly, CAGE-seq data are considered. A two-stage method based on a novel region-finding algorithm followed by a classifier that integrates sequence patterns surrounding the identified regions is shown to possess superior performance to two existing methods. Similarly, a novel data summarisation approach to gene expression data, which integrates changes in location and scale into a unified metric, demonstrates benefits in two-class classification problems. The error rates are found to be competitive with existing methods, and the feature selection has higher stability and increased biological relevance. Finally, in the proteomics setting, there are many choices for how to summarise peptides to proteins, as well as issues relating to batch effects and whether internal controls are necessary. By developing a broad variety of performance metrics, and an accompanying web-based framework, novel recommendations about peptide to protein summaries and batch correction algorithms are made, and a surprising result regarding the necessity of internal standards is revealed.	en_AU
dc.subject	classification	en_AU
dc.subject	bioinformatics	en_AU
dc.subject	normalisation	en_AU
dc.subject	differential distribution	en_AU
dc.subject	proteomics	en_AU
dc.title	Novel Preprocessing Approaches for Omics Data Types and Their Performance Evaluation	en_AU
dc.type	Thesis	en_AU
dc.date.valid	2016-01-01	en_AU
dc.type.thesis	Doctor of Philosophy	en_AU
usyd.faculty	Faculty of Science, School of Mathematics and Statistics	en_AU
usyd.degree	Doctor of Philosophy Ph.D.	en_AU
usyd.awardinginst	The University of Sydney	en_AU