Model selection and estimating degrees of freedom in Bayesian linear and linear mixed effect models
Access status:
USyd Access
Type
ThesisThesis type
Doctor of PhilosophyAuthor/s
You, ChongAbstract
In practical applications of statistics, model selection is one of the most fundamental analysis tasks. The selection approaches in the linear regression model context have been extensively developed since the 1970's and are currently widely in use. However, the abilities of existing ...
See moreIn practical applications of statistics, model selection is one of the most fundamental analysis tasks. The selection approaches in the linear regression model context have been extensively developed since the 1970's and are currently widely in use. However, the abilities of existing methods to produce efficient and effective analysis for high-dimensional data are very limited. Moreover, most of the current model selection techniques are developed for the linear regression model only, and thus may not be appropriate in other model contexts, such as the linear mixed effects model. This thesis explores theory for a fast alternative method to select models and investigates effective procedures to select linear mixed effects models. The main emphasis of this thesis is placed on constructing desirable properties of Variational Bayes estimates in Bayesian linear models and on developing a new generalized degrees of freedom measure for linear mixed effects models. This thesis is structured into five chapters. Chapter 1 provides background knowledge on the models and assumptions used, summarises some popular model selection techniques, and additionally outlines a brief introduction on approximate Bayesian inference. In Chapter 2, we introduce Variational Bayes (VB) -- a fast alternative to Markov chain Monte Carlo for performing approximate Bayesian inference to analyse large datasets. VB is often criticised, typically based on empirical grounds, for being unable to produce valid statistical inferences. We contradict this criticism in the Bayesian linear model context for a particular choice of priors. In Chapter 2, we prove that under mild regularity conditions, VB based estimators for the coefficients and the variance are consistent estimators of the true parameters. Furthermore, we propose two variational Bayes information criteria: the variational Akaike information criterion and the variational Bayesian information criterion. We show that the variational Akaike information criterion is asymptotically equivalent to the frequentist Akaike information criterion and that the variational Bayesian information criterion shares the same first order asymptotic properties as the Bayesian information criterion in linear regression. Computationally, in the context of linear regression models, the variational Akaike information criterion and the variational Bayesian information criterion have no advantages over the Akaike information criterion and the Bayesian information criterion. However, they are naturally derived in Bayesian contexts with VB estimates and the asymptotic properties in linear regression models motivate the potential use of variational Bayes based information criteria for more complex models. In particular, the variational Akaike information criterion and the variational Bayesian information criterion can be used when it is not clear what the appropriate sample size and model size are. In addition, we provide a new degrees of freedom measure for Bayesian linear models. These results are published in You et al. (2014). Encouraged by the consistency properties in Chapter 2, we extend the VB approach to more complicated priors in Chapter 3. Namely, we develop algorithms for model selection in a Bayesian linear model context with spike and slab priors, and investigate the theoretic properties of the corresponding VB estimators. We show under mild regularity conditions, that VB based estimators for the coefficients are consistent estimators of the true parameters. More importantly, we prove that the VB estimators of the model indicator variables shrink towards zero in probability at an exponential rate if the corresponding true value of the coefficient is zero; if the true coefficient value is non-zero then the convergence is at an exponential rate to one. This property allows us to use VB estimates to select high-dimensional models efficiently. We also investigate the selection of initial values to avoid local optimality problems. Simulation results support that our method is competitive in terms of efficiency and effectiveness in comparison to various alternative model selection procedures considered. In Chapter 4, we alter our focus from the linear model to a more complicated model context -- the linear mixed effects model. The linear mixed effects model is a commonly employed statistical model, it is highly flexible in dealing with a broad range of data and much more complicated than the linear regression model. However, there is not yet consensus in the statistical community on which model selection method to use. Chapter 4 considers the linear mixed effects model selection problem by first establishing an appropriate model complexity measure. The number of unknown regression parameters, which is typically used as the degrees of freedom to measure model complexity in linear regression models, may fail to work for linear mixed effects models. Hence, we propose a new definition of generalized degrees of freedom. It is derived from a somewhat non-likelihood point of view by using the sum of the sensitivity of the expected fitted values with respect to their underlying true means. We show that this proposed generalised degrees of freedom satisfies some desirable properties: (1) it equals the number of unknown regression parameters in linear regression if the least squares estimates are used, (2) empirically it is non-negative and monotone if models are nested. Furthermore, we explore and compare different solutions to approximate this generalized degrees of freedom. Two model selection procedures for linear mixed effects models based on the generalized degrees of freedom are then proposed, which can select random effects and fixed effects simultaneously, rather than selecting on fixed and random effects separately. The results of Chapter 4 are published in You et al. (2015). Chapter 5 provides a summary of the key contributions of this thesis as well as a discussion of some extensions.
See less
See moreIn practical applications of statistics, model selection is one of the most fundamental analysis tasks. The selection approaches in the linear regression model context have been extensively developed since the 1970's and are currently widely in use. However, the abilities of existing methods to produce efficient and effective analysis for high-dimensional data are very limited. Moreover, most of the current model selection techniques are developed for the linear regression model only, and thus may not be appropriate in other model contexts, such as the linear mixed effects model. This thesis explores theory for a fast alternative method to select models and investigates effective procedures to select linear mixed effects models. The main emphasis of this thesis is placed on constructing desirable properties of Variational Bayes estimates in Bayesian linear models and on developing a new generalized degrees of freedom measure for linear mixed effects models. This thesis is structured into five chapters. Chapter 1 provides background knowledge on the models and assumptions used, summarises some popular model selection techniques, and additionally outlines a brief introduction on approximate Bayesian inference. In Chapter 2, we introduce Variational Bayes (VB) -- a fast alternative to Markov chain Monte Carlo for performing approximate Bayesian inference to analyse large datasets. VB is often criticised, typically based on empirical grounds, for being unable to produce valid statistical inferences. We contradict this criticism in the Bayesian linear model context for a particular choice of priors. In Chapter 2, we prove that under mild regularity conditions, VB based estimators for the coefficients and the variance are consistent estimators of the true parameters. Furthermore, we propose two variational Bayes information criteria: the variational Akaike information criterion and the variational Bayesian information criterion. We show that the variational Akaike information criterion is asymptotically equivalent to the frequentist Akaike information criterion and that the variational Bayesian information criterion shares the same first order asymptotic properties as the Bayesian information criterion in linear regression. Computationally, in the context of linear regression models, the variational Akaike information criterion and the variational Bayesian information criterion have no advantages over the Akaike information criterion and the Bayesian information criterion. However, they are naturally derived in Bayesian contexts with VB estimates and the asymptotic properties in linear regression models motivate the potential use of variational Bayes based information criteria for more complex models. In particular, the variational Akaike information criterion and the variational Bayesian information criterion can be used when it is not clear what the appropriate sample size and model size are. In addition, we provide a new degrees of freedom measure for Bayesian linear models. These results are published in You et al. (2014). Encouraged by the consistency properties in Chapter 2, we extend the VB approach to more complicated priors in Chapter 3. Namely, we develop algorithms for model selection in a Bayesian linear model context with spike and slab priors, and investigate the theoretic properties of the corresponding VB estimators. We show under mild regularity conditions, that VB based estimators for the coefficients are consistent estimators of the true parameters. More importantly, we prove that the VB estimators of the model indicator variables shrink towards zero in probability at an exponential rate if the corresponding true value of the coefficient is zero; if the true coefficient value is non-zero then the convergence is at an exponential rate to one. This property allows us to use VB estimates to select high-dimensional models efficiently. We also investigate the selection of initial values to avoid local optimality problems. Simulation results support that our method is competitive in terms of efficiency and effectiveness in comparison to various alternative model selection procedures considered. In Chapter 4, we alter our focus from the linear model to a more complicated model context -- the linear mixed effects model. The linear mixed effects model is a commonly employed statistical model, it is highly flexible in dealing with a broad range of data and much more complicated than the linear regression model. However, there is not yet consensus in the statistical community on which model selection method to use. Chapter 4 considers the linear mixed effects model selection problem by first establishing an appropriate model complexity measure. The number of unknown regression parameters, which is typically used as the degrees of freedom to measure model complexity in linear regression models, may fail to work for linear mixed effects models. Hence, we propose a new definition of generalized degrees of freedom. It is derived from a somewhat non-likelihood point of view by using the sum of the sensitivity of the expected fitted values with respect to their underlying true means. We show that this proposed generalised degrees of freedom satisfies some desirable properties: (1) it equals the number of unknown regression parameters in linear regression if the least squares estimates are used, (2) empirically it is non-negative and monotone if models are nested. Furthermore, we explore and compare different solutions to approximate this generalized degrees of freedom. Two model selection procedures for linear mixed effects models based on the generalized degrees of freedom are then proposed, which can select random effects and fixed effects simultaneously, rather than selecting on fixed and random effects separately. The results of Chapter 4 are published in You et al. (2015). Chapter 5 provides a summary of the key contributions of this thesis as well as a discussion of some extensions.
See less
Date
2014-08-01Licence
The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.Faculty/School
Faculty of Science, School of Mathematics and StatisticsAwarding institution
The University of SydneyShare