Novel Preprocessing Approaches for Omics Data Types and Their Performance Evaluation
Field | Value | Language |
dc.contributor.author | Strbenac, Dario | |
dc.date.accessioned | 2016-12-06 | |
dc.date.available | 2016-12-06 | |
dc.date.issued | 2016-04-08 | |
dc.identifier.uri | http://hdl.handle.net/2123/16007 | |
dc.description.abstract | A diverse range of high-dimensional datasets has recently become available to help elucidate the functioning of biological systems and defects within those systems leading to disease. All of these new technologies come with the challenges of determining how the raw data should be efficiently processed or normalised and, subsequently, how can the data best be summarised for more complex downstream analysis. There are many approaches to summarising and normalising omics data, with new methods frequently being developed. To date, there has not been a comprehensive evaluation of existing methods for many omics data types. This thesis focusses on systematically evaluating existing methods for three different types of omics data and, having identified limitations in the current methods, also proposes new approaches to improve their quality. Firstly, CAGE-seq data are considered. A two-stage method based on a novel region-finding algorithm followed by a classifier that integrates sequence patterns surrounding the identified regions is shown to possess superior performance to two existing methods. Similarly, a novel data summarisation approach to gene expression data, which integrates changes in location and scale into a unified metric, demonstrates benefits in two-class classification problems. The error rates are found to be competitive with existing methods, and the feature selection has higher stability and increased biological relevance. Finally, in the proteomics setting, there are many choices for how to summarise peptides to proteins, as well as issues relating to batch effects and whether internal controls are necessary. By developing a broad variety of performance metrics, and an accompanying web-based framework, novel recommendations about peptide to protein summaries and batch correction algorithms are made, and a surprising result regarding the necessity of internal standards is revealed. | en_AU |
dc.subject | classification | en_AU |
dc.subject | bioinformatics | en_AU |
dc.subject | normalisation | en_AU |
dc.subject | differential distribution | en_AU |
dc.subject | proteomics | en_AU |
dc.title | Novel Preprocessing Approaches for Omics Data Types and Their Performance Evaluation | en_AU |
dc.type | Thesis | en_AU |
dc.date.valid | 2016-01-01 | en_AU |
dc.type.thesis | Doctor of Philosophy | en_AU |
usyd.faculty | Faculty of Science, School of Mathematics and Statistics | en_AU |
usyd.degree | Doctor of Philosophy Ph.D. | en_AU |
usyd.awardinginst | The University of Sydney | en_AU |
Associated file/s
Associated collections