Show simple item record

FieldValueLanguage
dc.contributor.authorStrbenac, Dario
dc.date.accessioned2016-12-06
dc.date.available2016-12-06
dc.date.issued2016-04-08
dc.identifier.urihttp://hdl.handle.net/2123/16007
dc.description.abstractA diverse range of high-dimensional datasets has recently become available to help elucidate the functioning of biological systems and defects within those systems leading to disease. All of these new technologies come with the challenges of determining how the raw data should be efficiently processed or normalised and, subsequently, how can the data best be summarised for more complex downstream analysis. There are many approaches to summarising and normalising omics data, with new methods frequently being developed. To date, there has not been a comprehensive evaluation of existing methods for many omics data types. This thesis focusses on systematically evaluating existing methods for three different types of omics data and, having identified limitations in the current methods, also proposes new approaches to improve their quality. Firstly, CAGE-seq data are considered. A two-stage method based on a novel region-finding algorithm followed by a classifier that integrates sequence patterns surrounding the identified regions is shown to possess superior performance to two existing methods. Similarly, a novel data summarisation approach to gene expression data, which integrates changes in location and scale into a unified metric, demonstrates benefits in two-class classification problems. The error rates are found to be competitive with existing methods, and the feature selection has higher stability and increased biological relevance. Finally, in the proteomics setting, there are many choices for how to summarise peptides to proteins, as well as issues relating to batch effects and whether internal controls are necessary. By developing a broad variety of performance metrics, and an accompanying web-based framework, novel recommendations about peptide to protein summaries and batch correction algorithms are made, and a surprising result regarding the necessity of internal standards is revealed.en_AU
dc.subjectclassificationen_AU
dc.subjectbioinformaticsen_AU
dc.subjectnormalisationen_AU
dc.subjectdifferential distributionen_AU
dc.subjectproteomicsen_AU
dc.titleNovel Preprocessing Approaches for Omics Data Types and Their Performance Evaluationen_AU
dc.typeThesisen_AU
dc.date.valid2016-01-01en_AU
dc.type.thesisDoctor of Philosophyen_AU
usyd.facultyFaculty of Science, School of Mathematics and Statisticsen_AU
usyd.degreeDoctor of Philosophy Ph.D.en_AU
usyd.awardinginstThe University of Sydneyen_AU


Show simple item record

Associated file/s

Associated collections

Show simple item record

There are no previous versions of the item available.