Analytics in Fintech and Insurtech: Insights Into Grouped Feature Selection, Network Change Point Detection and Structure Learning
Access status:
USyd Access
Type
ThesisThesis type
Doctor of PhilosophyAuthor/s
Leung, Wai YinAbstract
This thesis consists of three projects that contribute to insights into decision making and how human fits in the age of automation using data analytic tools.
In the first project, we study the relative role of cryptocurrency in the financial markets, compared to currencies and ...
See moreThis thesis consists of three projects that contribute to insights into decision making and how human fits in the age of automation using data analytic tools. In the first project, we study the relative role of cryptocurrency in the financial markets, compared to currencies and commodities. We propose a method for detecting changes in interdependence among cryptocurrency, currency, and commodity markets. We adopt the partial correlation matrix as a similarity measure and develop a statistical testing procedure tailored to such a construct with minimal distributional assumptions. We analyze the network properties and centrality measures pre- and post-change points to examine the role of cryptocurrency in the market. The second project investigates the selection of relevant attributes in Insurtech data. Modern insurance data have evolved, and so must the insurance industry. Insurance policies are now mostly purchased via the internet. Along the way, immense amounts of individual-level consumer data are continuously generated and stored. Unfortunately, just collecting vast amounts of data is insufficient. To translate a data-intensive environment into data-enabled competitive advantages, one must also carefully select the relevant attributes. An appropriate subset of features identifies which potential customer is of high risk and informs the premium pricing decisions and guides the formulation of loss-reserving, marketing, and customer management strategies. We leverage the technological and algorithmic advancements made in the last quarter-century and present an integer optimisation approach of selecting the most relevant indicators of high-risk insurance customers. We propose a novel high- dimensional classifier: The sparse hinge loss group estimator, which minimises the number of attributes in a model subject to a budget on the correlation between the variables and the errors. Using synthetic and empirical data, we demonstrate that, in some circumstances, the performance of the sparse hinge loss group estimator is superior to the existing popular approaches. The third project explores the Bayesian network structure learning problem, which aims to learn the cause and effect relationships between variables in a dataset. We explore how the performance of the integer program (IP) based Bayesian network learning formulation in Bartlett and Cussens (2017) can be improved through viewing the Bayesian network as a walk. This allows us to solve the IP and to compute the scores iteratively in a greedy way. The proposed method is validated using computational experiments and is found to obtain near-optimal solution using commercial mixed-integer optimization (MIO) solver Gurobi as a benchmark. We also introduce k-layer constraints in a mixed-integer formulation that enables us to control the number of layers in a Bayesian network. This can be beneficial in real-life applications when prior knowledge or restrictions regarding the number of layers is given to enhance interpretability. This project also assesses the potential for approximation algorithm and column generation procedures based on an interesting observation that the Bayesian network structure learning problem can be reformulated as a maximum flow problem.
See less
See moreThis thesis consists of three projects that contribute to insights into decision making and how human fits in the age of automation using data analytic tools. In the first project, we study the relative role of cryptocurrency in the financial markets, compared to currencies and commodities. We propose a method for detecting changes in interdependence among cryptocurrency, currency, and commodity markets. We adopt the partial correlation matrix as a similarity measure and develop a statistical testing procedure tailored to such a construct with minimal distributional assumptions. We analyze the network properties and centrality measures pre- and post-change points to examine the role of cryptocurrency in the market. The second project investigates the selection of relevant attributes in Insurtech data. Modern insurance data have evolved, and so must the insurance industry. Insurance policies are now mostly purchased via the internet. Along the way, immense amounts of individual-level consumer data are continuously generated and stored. Unfortunately, just collecting vast amounts of data is insufficient. To translate a data-intensive environment into data-enabled competitive advantages, one must also carefully select the relevant attributes. An appropriate subset of features identifies which potential customer is of high risk and informs the premium pricing decisions and guides the formulation of loss-reserving, marketing, and customer management strategies. We leverage the technological and algorithmic advancements made in the last quarter-century and present an integer optimisation approach of selecting the most relevant indicators of high-risk insurance customers. We propose a novel high- dimensional classifier: The sparse hinge loss group estimator, which minimises the number of attributes in a model subject to a budget on the correlation between the variables and the errors. Using synthetic and empirical data, we demonstrate that, in some circumstances, the performance of the sparse hinge loss group estimator is superior to the existing popular approaches. The third project explores the Bayesian network structure learning problem, which aims to learn the cause and effect relationships between variables in a dataset. We explore how the performance of the integer program (IP) based Bayesian network learning formulation in Bartlett and Cussens (2017) can be improved through viewing the Bayesian network as a walk. This allows us to solve the IP and to compute the scores iteratively in a greedy way. The proposed method is validated using computational experiments and is found to obtain near-optimal solution using commercial mixed-integer optimization (MIO) solver Gurobi as a benchmark. We also introduce k-layer constraints in a mixed-integer formulation that enables us to control the number of layers in a Bayesian network. This can be beneficial in real-life applications when prior knowledge or restrictions regarding the number of layers is given to enhance interpretability. This project also assesses the potential for approximation algorithm and column generation procedures based on an interesting observation that the Bayesian network structure learning problem can be reformulated as a maximum flow problem.
See less
Date
2021Rights statement
The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.Faculty/School
The University of Sydney Business School, Discipline of Business AnalyticsAwarding institution
The University of SydneyShare