Robust estimation and variable selection for cellwise contaminated data
Access status:
Open Access
Type
ThesisThesis type
Doctor of PhilosophyAuthor/s
Su, PengAbstract
Outliers are widespread in real-world datasets. Recognizing outliers and running robust analyses is still a challenging topic. Recently, there has been
increased attention on cellwise outliers. In contrast to traditional rowwise (observation-wise) outliers, cellwise outliers ...
See moreOutliers are widespread in real-world datasets. Recognizing outliers and running robust analyses is still a challenging topic. Recently, there has been increased attention on cellwise outliers. In contrast to traditional rowwise (observation-wise) outliers, cellwise outliers target individual cells within observations of a dataset, where only specific cells within each row may be contaminated. Several challenges need to be addressed in this field of research, such as outlier detection, robust covariance matrix estimation and robust (sparse) regression. We introduce a Gaussian rank based Lasso estimator, which uses the Gaussian rank correlation to obtain an initial empirical covariance matrix among the response and potential active predictors. We re-parameterise the classical linear regression model design matrix and the response vector to take advantage of these robustly estimated components before applying the adaptive Lasso to obtain consistent variable selection results. We also introduce cellwise regularized Lasso, a regularized regression method to address cellwise outliers, which employs a cellwise shrinkage procedure that shrinks outlying cells based on the magnitude of regression residuals and cell deviations.
See less
See moreOutliers are widespread in real-world datasets. Recognizing outliers and running robust analyses is still a challenging topic. Recently, there has been increased attention on cellwise outliers. In contrast to traditional rowwise (observation-wise) outliers, cellwise outliers target individual cells within observations of a dataset, where only specific cells within each row may be contaminated. Several challenges need to be addressed in this field of research, such as outlier detection, robust covariance matrix estimation and robust (sparse) regression. We introduce a Gaussian rank based Lasso estimator, which uses the Gaussian rank correlation to obtain an initial empirical covariance matrix among the response and potential active predictors. We re-parameterise the classical linear regression model design matrix and the response vector to take advantage of these robustly estimated components before applying the adaptive Lasso to obtain consistent variable selection results. We also introduce cellwise regularized Lasso, a regularized regression method to address cellwise outliers, which employs a cellwise shrinkage procedure that shrinks outlying cells based on the magnitude of regression residuals and cell deviations.
See less
Date
2023Rights statement
The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.Faculty/School
Faculty of Science, School of Mathematics and StatisticsAwarding institution
The University of SydneyShare