Deep Learning Methods for Health Data Imputation and Classification

Phung, Le Son

Access status:

Open Access

Field	Value	Language
dc.contributor.author	Phung, Le Son
dc.date.accessioned	2021-04-30T02:30:21Z
dc.date.available	2021-04-30T02:30:21Z
dc.date.issued	2021	en_AU
dc.identifier.uri	https://hdl.handle.net/2123/25007
dc.description.abstract	The analysis of digital health data with machine learning models can be used in clinical applications such as computer aided diagnosis and population health classifi- cation. However, the quality of machine learning is highly dependent on the quality of the underlying data. The quality of a dataset can be negatively impacted by missing values, which are a common occurrence in medical and health datasets, and can cause biased estimations or loss of statistical power. Data imputation (or reconstruction) is one way to reduce missing data, however current methods are generally based around statistical measures (mean, median) or proximity in the dataset (nearest neighbour, last value carried forward). The emer- gence of deep learning has enabled the derivation of autoencoder (AE) based imputa- tion methods that learn the imputed values from the raw data. Existing (both deep and non-deep) methods, do not always take into account the relationship between the values and the distributions of the variables in groups of records, thus leading to loss in performance when the imputed values are used in classification. In this thesis, our goal is to design unsupervised deep learning imputation meth- ods for missing data in health such that the imputed values result in improvements in downstream classification. Our key contributions are techniques to improve the im- putation quality and learning capacity of AE based imputation models. We describe two novel deep learning techniques: (i) an extension of an AE that treats missing data as noise and can learn hidden representations of the noisy data, and (ii) a deeper net- work that stacks our first AE as a building block and leverages residual learning to learn more refined imputations. Our first method, which we have named the Overcomplete Denoising Autoencoder (ODAE), extends upon an autoencoder to derive a deep learning architecture that can learn the hidden representations of data even when data is perturbed by missing val- ues (noise). Our model is constructed with overcomplete representation and trained with denoising regularization. This allows the latent/hidden layers of our model to effectively extract the relationships between different variables; these relationships are then used to reconstruct missing values. In addition to the architecture, our con- tributions include a new loss function designed to avoid local optima, which helps the model to learn the ‘real‘ (unaffected by missingness) distribution of variables in the dataset. We evaluate our method in comparison with other well-established im- putation strategies (mean, median imputation, SVD, kNN, matrix factorization and soft impute) on a publicly available dataset. Our experiments demonstrate that our method achieved lower imputation mean squared error compared with other impu- tation methods. When assessing the imputation quality using the imputed data for prediction tasks, our experiments show that the data imputed by our method yielded better results compared with other imputation methods.	en_AU
dc.language.iso	en	en_AU
dc.subject	imputation	en_AU
dc.subject	autoencoder	en_AU
dc.subject	missing data	en_AU
dc.title	Deep Learning Methods for Health Data Imputation and Classification	en_AU
dc.type	Thesis
dc.type.thesis	Masters by Research	en_AU
dc.rights.other	The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.	en_AU
usyd.faculty	SeS faculties schools::Faculty of Engineering::School of Computer Science	en_AU
usyd.degree	Master of Philosophy M.Phil	en_AU
usyd.awardinginst	The University of Sydney	en_AU
usyd.advisor	Kim, Jinman