Recoding of Markov Processes in Phylogenetic Models
Access status:
USyd Access
Type
ThesisThesis type
Doctor of PhilosophyAuthor/s
Vera Ruiz, VictorAbstract
Under a Markov model of evolution, lumping the state space (S) into fewer groups has been historically used to focus on specific types of substitutions or to reduce compositional heterogeneity and saturation. However, working with reduced state spaces (S’) may yield misleading ...
See moreUnder a Markov model of evolution, lumping the state space (S) into fewer groups has been historically used to focus on specific types of substitutions or to reduce compositional heterogeneity and saturation. However, working with reduced state spaces (S’) may yield misleading results unless the Markovian property is kept. A Markov process X(t) is lumpable if the reduced process X’(t) of S’ is Markovian. The aim of this Thesis is to develop a test able to detect if a given X(t) is lumpable with respect to a given S’. This test should allow flexibility to any possible non-trivial S’ and should not depend on evolutionary assumptions such as stationarity, homogeneity or reversibility (SHR conditions) over a phylogenetic tree. We developed three tests for lumpability for SHR Markovian processes on two taxa and compared them: one using an ad hoc statistic based on an index that is evaluated using a bootstrap approximation of its distribution; one based on a test proposed specifically for Markov chains; and one using a likelihood-ratio (LR) test. We show that the LR test is more powerful than the other two tests, and that it can be applied in all pairs of taxa for binary trees with more than two taxa under SHR conditions. Then, we generalized the LR test for cases where the SHR conditions may not hold. We show that the distribution of this test statistic approximates a chi square with a number of degrees of freedom equal to the number of different rate matrices in the tree by two. In all cases, we show that if X(t) is lumpable, the obtained estimates for X’(t) agree with the obtained estimates for X(t), whereas, if X(t) is not lumpable, these estimates can differ substantially. We conclude that lumping S may result in biased phylogenetic estimates if the original X(t) is not lumpable. Accordingly, testing for lumpability should be done prior to any phylogenetic analysis of recoded data.
See less
See moreUnder a Markov model of evolution, lumping the state space (S) into fewer groups has been historically used to focus on specific types of substitutions or to reduce compositional heterogeneity and saturation. However, working with reduced state spaces (S’) may yield misleading results unless the Markovian property is kept. A Markov process X(t) is lumpable if the reduced process X’(t) of S’ is Markovian. The aim of this Thesis is to develop a test able to detect if a given X(t) is lumpable with respect to a given S’. This test should allow flexibility to any possible non-trivial S’ and should not depend on evolutionary assumptions such as stationarity, homogeneity or reversibility (SHR conditions) over a phylogenetic tree. We developed three tests for lumpability for SHR Markovian processes on two taxa and compared them: one using an ad hoc statistic based on an index that is evaluated using a bootstrap approximation of its distribution; one based on a test proposed specifically for Markov chains; and one using a likelihood-ratio (LR) test. We show that the LR test is more powerful than the other two tests, and that it can be applied in all pairs of taxa for binary trees with more than two taxa under SHR conditions. Then, we generalized the LR test for cases where the SHR conditions may not hold. We show that the distribution of this test statistic approximates a chi square with a number of degrees of freedom equal to the number of different rate matrices in the tree by two. In all cases, we show that if X(t) is lumpable, the obtained estimates for X’(t) agree with the obtained estimates for X(t), whereas, if X(t) is not lumpable, these estimates can differ substantially. We conclude that lumping S may result in biased phylogenetic estimates if the original X(t) is not lumpable. Accordingly, testing for lumpability should be done prior to any phylogenetic analysis of recoded data.
See less
Date
2014-09-01Licence
The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.Faculty/School
Faculty of Science, School of Mathematics and StatisticsAwarding institution
The University of SydneyShare