Statistical methods for the analysis and interpretation of RNA-Seq data
Access status:
Open Access
Type
ThesisThesis type
Doctor of PhilosophyAuthor/s
Patrick, EllisAbstract
In the post-genomic era, sequencing technologies have become a vital tool in the global analysis of biological systems. RNA-Seq, the sequencing of messenger RNA, in particular has the potential to answer many diverse and interesting questions about the inner workings of cells. ...
See moreIn the post-genomic era, sequencing technologies have become a vital tool in the global analysis of biological systems. RNA-Seq, the sequencing of messenger RNA, in particular has the potential to answer many diverse and interesting questions about the inner workings of cells. Despite the decreasing cost of sequencing data, the majority of RNA-Seq experiments are still suffering from low replication numbers. The statistical methodology for dealing with low replicate RNA-Seq experiments is still in its infancy and has room for further development. Incorporating additional information from publicly accessible databases may provide a plausible avenue to overcome the shortcomings of low replication. Not only could this additional information improve on the ability to find statistically significant signal but this signal should also be more biologically interpretable. This thesis is separated into three distinct statistical problems that arise when processing and analysing RNA-Seq data. Firstly, the use of experimental data to customise gene annotations is proposed. When customised annotations are used to summarise read counts, the corresponding measures of transcript abundance include more information than alternate summarisation approaches and offer improved concordance with qRT-PCR data. A moderation methodology that exploits external estimates of variation is then developed to address the issue of small sample differential expression analysis. This approach performs favourably against existing approaches when comparing gene rankings and sensitivity. With the aim of identifying groups of miRNA-mRNA regulatory relationships, a framework for integrating various databases of prior knowledge with small sample miRNA-Seq and mRNA-Seq data is then outlined. This framework appears to identify more signal than simpler approaches and also provides highly interpretable models of miRNA-mRNA regulation. To conclude, a small sample miRNA-Seq and mRNA-Seq experiment is presented that seeks to discover miRNA-mRNA regulatory relationships associated with loss of Notch2 function and its links to neurodegeneration. This experiment is used to illustrate the methodologies developed in this thesis.
See less
See moreIn the post-genomic era, sequencing technologies have become a vital tool in the global analysis of biological systems. RNA-Seq, the sequencing of messenger RNA, in particular has the potential to answer many diverse and interesting questions about the inner workings of cells. Despite the decreasing cost of sequencing data, the majority of RNA-Seq experiments are still suffering from low replication numbers. The statistical methodology for dealing with low replicate RNA-Seq experiments is still in its infancy and has room for further development. Incorporating additional information from publicly accessible databases may provide a plausible avenue to overcome the shortcomings of low replication. Not only could this additional information improve on the ability to find statistically significant signal but this signal should also be more biologically interpretable. This thesis is separated into three distinct statistical problems that arise when processing and analysing RNA-Seq data. Firstly, the use of experimental data to customise gene annotations is proposed. When customised annotations are used to summarise read counts, the corresponding measures of transcript abundance include more information than alternate summarisation approaches and offer improved concordance with qRT-PCR data. A moderation methodology that exploits external estimates of variation is then developed to address the issue of small sample differential expression analysis. This approach performs favourably against existing approaches when comparing gene rankings and sensitivity. With the aim of identifying groups of miRNA-mRNA regulatory relationships, a framework for integrating various databases of prior knowledge with small sample miRNA-Seq and mRNA-Seq data is then outlined. This framework appears to identify more signal than simpler approaches and also provides highly interpretable models of miRNA-mRNA regulation. To conclude, a small sample miRNA-Seq and mRNA-Seq experiment is presented that seeks to discover miRNA-mRNA regulatory relationships associated with loss of Notch2 function and its links to neurodegeneration. This experiment is used to illustrate the methodologies developed in this thesis.
See less
Date
2013-10-22Licence
The author retains copyright of this thesisFaculty/School
Faculty of Science, School of Mathematics and StatisticsAwarding institution
The University of SydneyShare