Statistical methods for the analysis and interpretation of RNA-Seq data

Patrick, Ellis

Access status:

Open Access

Field	Value	Language
dc.contributor.author	Patrick, Ellis
dc.date.accessioned	2014-04-29
dc.date.available	2014-04-29
dc.date.issued	2013-10-22
dc.identifier.uri	http://hdl.handle.net/2123/10438
dc.description.abstract	In the post-genomic era, sequencing technologies have become a vital tool in the global analysis of biological systems. RNA-Seq, the sequencing of messenger RNA, in particular has the potential to answer many diverse and interesting questions about the inner workings of cells. Despite the decreasing cost of sequencing data, the majority of RNA-Seq experiments are still suffering from low replication numbers. The statistical methodology for dealing with low replicate RNA-Seq experiments is still in its infancy and has room for further development. Incorporating additional information from publicly accessible databases may provide a plausible avenue to overcome the shortcomings of low replication. Not only could this additional information improve on the ability to find statistically significant signal but this signal should also be more biologically interpretable. This thesis is separated into three distinct statistical problems that arise when processing and analysing RNA-Seq data. Firstly, the use of experimental data to customise gene annotations is proposed. When customised annotations are used to summarise read counts, the corresponding measures of transcript abundance include more information than alternate summarisation approaches and offer improved concordance with qRT-PCR data. A moderation methodology that exploits external estimates of variation is then developed to address the issue of small sample differential expression analysis. This approach performs favourably against existing approaches when comparing gene rankings and sensitivity. With the aim of identifying groups of miRNA-mRNA regulatory relationships, a framework for integrating various databases of prior knowledge with small sample miRNA-Seq and mRNA-Seq data is then outlined. This framework appears to identify more signal than simpler approaches and also provides highly interpretable models of miRNA-mRNA regulation. To conclude, a small sample miRNA-Seq and mRNA-Seq experiment is presented that seeks to discover miRNA-mRNA regulatory relationships associated with loss of Notch2 function and its links to neurodegeneration. This experiment is used to illustrate the methodologies developed in this thesis.	en
dc.rights	The author retains copyright of this thesis
dc.title	Statistical methods for the analysis and interpretation of RNA-Seq data	en
dc.type	Thesis	en
dc.date.valid	2014-01-01	en
dc.type.thesis	Doctor of Philosophy	en
usyd.faculty	Faculty of Science, School of Mathematics and Statistics	en
usyd.degree	Doctor of Philosophy Ph.D.	en
usyd.awardinginst	The University of Sydney	en