Corporate bankruptcy prediction: Analysis of statistical and machine learning models using accounting, market, market microstructure, and derivative instrument information
Access status:
USyd Access
Type
ThesisThesis type
Doctor of PhilosophyAuthor/s
Alam, Md NurulAbstract
Over the last seven decades, corporate bankruptcy and distress prediction have garnered interest from academics and practitioners alike. Prediction of firm failure is used in a variety of contexts, ranging from auditing to regulatory processes and procedures. High profile corporate ...
See moreOver the last seven decades, corporate bankruptcy and distress prediction have garnered interest from academics and practitioners alike. Prediction of firm failure is used in a variety of contexts, ranging from auditing to regulatory processes and procedures. High profile corporate collapses (e.g., Lehman Brothers and Washington Mutual) have driven stakeholder interest in the development of robust models in both the private and public sectors. Models that are based on artificial intelligence (AI) and machine learning (ML) show higher prediction accuracy than other models, which has contributed to the recent rise in their popularity. While there is much debate surrounding the performance of ML models relative to traditional parametric methods, limited discussion has focused on the performance of ML models relative to other statistical models, such as the semi-parametric hazard model and the multivariate adaptive regression splines (MARS) model. This study aims to address this gap by comparing the predictive power of ML models with various statistical models. This study also investigates the relative predictive power of various market-based variables to provide an analysis of their performance compared to accounting-based variables. To gain significant and robust insights for the contribution of both the accounting and market-based variables in the context of bankruptcy prediction, I use different types of models such as the semi-parametric hazard model, a standard logit model, and MARS, as well as ML models such as random forest (RF®) and gradient boosting (TreeNet®). In addition to free software, R, I use a commercial version of statistical software from Salford Systems. Another key focus of this study is to gain an understanding of the usefulness and limitations of different models. To do so, I compare the model output from each of these models. Further, the literature has largely ignored time dimension when assessing the contribution of different variables, which makes comparability challenging. This study examines whether the contributions of variables remain the same over years. To address these research problems, this study uses a very large sample, compiling different types of variables from COMPUSTAT, CRSP, Supplemental Short Interest File, S&P Capital IQ, Audit Analytics, and MARKIT databases. The study provides evidence that some under-explored market-based variables, such as expected default frequency, contribute significantly to bankruptcy prediction. The analysis also suggests that, in addition to market-based variables, accounting-based variables should be used in any well-defined model of bankruptcy prediction as they also contribute significantly to final model building. Further, I conclude that the advanced ML models, such as RF® and TreeNet®, have superior performance in predictive accuracy. Moreover, these ML models are equipped with some interpretable outputs, such as relative variable importance and partial dependence plots. I also find that splitting the whole sample into sub-samples based on time duration rather than traditional random percentage partition allows to assess whether the predictive ability of variables remains the same. Results from both ML models suggest that the performances of variables vary over the time, and, interestingly, that the performance of market-based variables is stronger in later periods.
See less
See moreOver the last seven decades, corporate bankruptcy and distress prediction have garnered interest from academics and practitioners alike. Prediction of firm failure is used in a variety of contexts, ranging from auditing to regulatory processes and procedures. High profile corporate collapses (e.g., Lehman Brothers and Washington Mutual) have driven stakeholder interest in the development of robust models in both the private and public sectors. Models that are based on artificial intelligence (AI) and machine learning (ML) show higher prediction accuracy than other models, which has contributed to the recent rise in their popularity. While there is much debate surrounding the performance of ML models relative to traditional parametric methods, limited discussion has focused on the performance of ML models relative to other statistical models, such as the semi-parametric hazard model and the multivariate adaptive regression splines (MARS) model. This study aims to address this gap by comparing the predictive power of ML models with various statistical models. This study also investigates the relative predictive power of various market-based variables to provide an analysis of their performance compared to accounting-based variables. To gain significant and robust insights for the contribution of both the accounting and market-based variables in the context of bankruptcy prediction, I use different types of models such as the semi-parametric hazard model, a standard logit model, and MARS, as well as ML models such as random forest (RF®) and gradient boosting (TreeNet®). In addition to free software, R, I use a commercial version of statistical software from Salford Systems. Another key focus of this study is to gain an understanding of the usefulness and limitations of different models. To do so, I compare the model output from each of these models. Further, the literature has largely ignored time dimension when assessing the contribution of different variables, which makes comparability challenging. This study examines whether the contributions of variables remain the same over years. To address these research problems, this study uses a very large sample, compiling different types of variables from COMPUSTAT, CRSP, Supplemental Short Interest File, S&P Capital IQ, Audit Analytics, and MARKIT databases. The study provides evidence that some under-explored market-based variables, such as expected default frequency, contribute significantly to bankruptcy prediction. The analysis also suggests that, in addition to market-based variables, accounting-based variables should be used in any well-defined model of bankruptcy prediction as they also contribute significantly to final model building. Further, I conclude that the advanced ML models, such as RF® and TreeNet®, have superior performance in predictive accuracy. Moreover, these ML models are equipped with some interpretable outputs, such as relative variable importance and partial dependence plots. I also find that splitting the whole sample into sub-samples based on time duration rather than traditional random percentage partition allows to assess whether the predictive ability of variables remains the same. Results from both ML models suggest that the performances of variables vary over the time, and, interestingly, that the performance of market-based variables is stronger in later periods.
See less
Date
2021Rights statement
The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.Faculty/School
The University of Sydney Business School, Discipline of AccountingAwarding institution
The University of SydneyShare