Show simple item record

FieldValueLanguage
dc.contributor.authorXu, Ning
dc.date.accessioned2020-07-24
dc.date.available2020-07-24
dc.date.issued2020en_AU
dc.identifier.urihttps://hdl.handle.net/2123/22920
dc.description.abstractFrom the perspective of econometrics, an accurate variable selection method greatly enhances the reliability of causal analysis and interpretation of the estimators, espe- cially in a world of ever-expanding data dimensions. While variable selection methods in machine learning and statistics have been developed rapidly and applied widely in different branches of data science in the last decade, they have been more slowly adopted in econometrics. Nevertheless, the machine learning methods, including lasso, forward regression, cross-validation and marginal correlation ranking (also called vari- able screening) are subject to a range of issues that may result in errors in variable selection and inaccurate causal interpretation. I propose two new variable-selection methods that significantly mitigate the issues with existing techniques and that provide accurate variable selection and reliable causal structure estimation in high-dimensional data. In Chapter 1, I develop bounds for cross-validation errors that may be used as a criterion for variable selection with many existing learning algorithms (including lasso, forward regression and variable screen- ing), yielding a sparse and stable model that retains all of the relevant variables. In Chapter 2, I develop an entirely new learning algorithm for variable selection— subsample-ordered least-angle regression (solar)—and show in simulations that solar out-performs coordinate descent and lars-lasso in terms of the sparsity, stability, ac- curacy, and robustness of variable selection. In Chapter 3 I demonstrate the superior variable-selection performance of solar using real-world data from two completely dif- ferent samples: prostate cancer patients and house prices. I also show that combining solar variable selection with linear probabilistic graph learning yields a plausible, data- driven method to recover causal structure in data.en_AU
dc.language.isoenen_AU
dc.publisherUniversity of Sydneyen_AU
dc.subjectvariable selectionen_AU
dc.subjectleast angle regressionen_AU
dc.subjectdirected acyclic graphen_AU
dc.subjectconstraint based learningen_AU
dc.subjectcasual structure recoveryen_AU
dc.subjecthigh dimensional spacesen_AU
dc.titleAccurate variable selection and causal structure recovery in high-dimensional dataen_AU
dc.typeThesis
dc.type.thesisDoctor of Philosophyen_AU
dc.rights.otherThe author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.en_AU
usyd.facultySeS faculties schools::Faculty of Arts and Social Sciences::School of Economicsen_AU
usyd.degreeDoctor of Philosophy Ph.D.en_AU
usyd.awardinginstThe University of Sydneyen_AU
usyd.advisorFisher, Timothy


Show simple item record

Associated file/s

Associated collections

Show simple item record

There are no previous versions of the item available.