Resampling Based Model Selection for Correlated and Complex Data
Field | Value | Language |
dc.contributor.author | Smith, Connor James | |
dc.date.accessioned | 2022-02-09T04:14:49Z | |
dc.date.available | 2022-02-09T04:14:49Z | |
dc.date.issued | 2022 | en_AU |
dc.identifier.uri | https://hdl.handle.net/2123/27428 | |
dc.description.abstract | Variable selection is a key component of regression modelling but slight changes to the initial data can result in changes to the models identified. In this thesis, we identify and examine multiple problems within the variable selection space and how through the use of stability based approaches we can construct solutions, where there is a current lack of statistical frameworks. At its core, this thesis tackles complex data in a generalized linear model (GLM) framework; both in robust and higher dimensional settings. We target three main aspects: - The inability to use exhaustive variable selection approaches within the robust generalized linear model space. - The struggles of stable variable selection for omics micro-array data where the number of variables is significantly larger than the total number of observations. - Extracting information from multiple penalized regression solution paths to classify variables into different classes through both automated and visual classification. In Chapter 1, we provide an overview of variable selection methods with the main focus placed on GLMs. In Chapter 2, we bring variable selection methods in a robust GLM space closer to the gold standard of the exhaustive search through the new RobStab (Robust Stability) framework. In Chapter 3, we propose a novel stability based variable selection method, VIVID (VIsulationation of Variable Importance Differences), for omics GLM data. In Chapter 4, we expand upon the use of a single tuning parameter within penalized regression for variable selection with the new method ParSPaS. In Chapter 5 we make some final remarks and conclude the thesis. For all proposed methods, we provide publicly available computational implementations through R. | en_AU |
dc.language.iso | en | en_AU |
dc.subject | Variable Selection | en_AU |
dc.subject | Stability Selection | en_AU |
dc.subject | Statistics | en_AU |
dc.title | Resampling Based Model Selection for Correlated and Complex Data | en_AU |
dc.type | Thesis | |
dc.type.thesis | Doctor of Philosophy | en_AU |
dc.rights.other | The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission. | en_AU |
usyd.faculty | SeS faculties schools::Faculty of Science::School of Mathematics and Statistics | en_AU |
usyd.degree | Doctor of Philosophy Ph.D. | en_AU |
usyd.awardinginst | The University of Sydney | en_AU |
usyd.advisor | Mueller, Samuel |
Associated file/s
Associated collections