Revealing the Development of Immune Response through Temporal Dynamic Clustering
Access status:
USyd Access
Type
ThesisThesis type
Doctor of PhilosophyAuthor/s
Putri, Givanna HaryonoAbstract
The immune system is an integral part of our body and is responsible for keeping us healthy. For instance, under pathogenic invasion, an immune response is mounted to identify and clear the pathogens. The immune response is not always successful: whilst some individuals recover ...
See moreThe immune system is an integral part of our body and is responsible for keeping us healthy. For instance, under pathogenic invasion, an immune response is mounted to identify and clear the pathogens. The immune response is not always successful: whilst some individuals recover from a given infection, others do not. This has spurred efforts to better understand the process in the hope of finding effective health-promoting interventions. Cytometry is a key technology for quantifying the immune response, creating datasets that capture measurements of numerous characteristics of each individual cell in a biological sample. It has become apparent that the immune response encompasses a great many distinct cell types that all interact in a coordinated fashion across many organs, and that the process is highly dynamic over time. Understanding the immune response, and how and when to intervene, has emerged as a very challenging task. Importantly, it is a task that can be framed as a data science problem. Cytometry datasets are created at several time-points post-infection, or at different disease severity stages, and the task is to identify which immune cell populations are present, how they each evolve or vary, and to map this onto time/disease-stage. These immune response maps, once created, can assist clinicians in improving health outcomes. We firstly propose a novel density-based clustering and cluster tracking algorithm, ChronoClust, for tracking the temporal changes of cell populations in time-series of discrete cytometry datasets. We conduct a comprehensive qualitative and quantitative evaluation of ChronoClust’s performance on: (i) a synthetic dataset capturing the characteristics of an immune response as observed through temporal cytometry data, and (ii) a real cytometry dataset elucidating the immune response development in the bone marrow of West Nile Virus (WNV)-infected mice (WNV dataset). Our results demonstrate the ability of ChronoClust to cluster cells into cell populations and track their evolutions in an unsupervised and automated manner. We then investigate the potential of dimensionality reduction techniques to ease the computational burden of clustering and tracking temporal cytometry data whilst minimally diminishing the clustering and cluster tracking performance. We explore 3 dimensionality reduction techniques in conjunction with ChronoClust. To obtain a broad sample of clustering performances, the full and reduced WNV datasets are independently clustered 400 times using400 unique ChronoClust hyperparameter value sets. We conclude that for large unwieldy datasets, dimensionality reduction can prove advantageous if the computational expense is otherwise prohibitive. Many clustering algorithms now exist for clustering cytometry data into discrete cell populations. Comparative algorithm evaluations on benchmark datasets rely on either a single performance metric, or a few metrics considered independently of one another. However, single metrics emphasise different aspects of clustering performance and do not rank clustering solutions in the same order. This underlies the lack of consensus between comparative studies regarding the optimal clustering algorithms. We propose ParetoBench, a Pareto fronts based framework as an integrative evaluation protocol, wherein individual performance metrics are leveraged as complementary perspectives, and a broad systematic sampling of algorithms’ hyperparameter values is used to reveal how meticulously must those algorithms be tuned to obtain good clustering performance. We exemplify the protocol by comparing 3 clustering algorithms (ChronoClust, FlowSOM and Phenograph) using 4 performance metrics applied across 4 cytometry datasets. Next, we present TrackSOM, a fast and effective clustering and cluster tracking algorithm which: (1) combines the excellent clustering quality and fast run time of FlowSOM with the tracking capability of ChronoClust, and (2) includes visualisation methods to assist in exploring the uncovered immune response dynamic. TrackSOM encompasses several modes of operation to suit a variety of experimental contexts, spanning users possessing exact know-ledge of how many cell phenotypes their data contains versus those who are engaged in unguided exploration. We demonstrate TrackSOM’s capacity on both synthetic and real-world datasets, provide usage advice to users, and exemplify novel discovery through its application. For our real-world use-case, we characterise the immune response to WNV infection in mice, uncovering heterogeneous sub-populations of immune cells and relating their functional evolution to disease severity. We perform a parameter sensitivity analysis and demonstrate TrackSOM to have both an improved performance and lower sensitivity to parameter value selections over ChronoClust. Importantly, TrackSOM verifies the robustness and generasibility of the cytometry-specific cluster tracking approach developed under ChronoClust. Finally, we propose SOMInsight, a novel technique which combines TrackSOM with other computational techniques to compute a set of temporally dynamic immune features that reveal the dynamics of cell populations and discriminate clinical outcomes. Our qualitative evaluation demonstrates SOMInsight’s ability to uncover cell population changes that are consistent with previous biological findings. Furthermore, it also discovers new biological insight which warrants further biological experiments to corroborate. All computational techniques are publicly available as open source software to support their use by the community and promote reproducible results.
See less
See moreThe immune system is an integral part of our body and is responsible for keeping us healthy. For instance, under pathogenic invasion, an immune response is mounted to identify and clear the pathogens. The immune response is not always successful: whilst some individuals recover from a given infection, others do not. This has spurred efforts to better understand the process in the hope of finding effective health-promoting interventions. Cytometry is a key technology for quantifying the immune response, creating datasets that capture measurements of numerous characteristics of each individual cell in a biological sample. It has become apparent that the immune response encompasses a great many distinct cell types that all interact in a coordinated fashion across many organs, and that the process is highly dynamic over time. Understanding the immune response, and how and when to intervene, has emerged as a very challenging task. Importantly, it is a task that can be framed as a data science problem. Cytometry datasets are created at several time-points post-infection, or at different disease severity stages, and the task is to identify which immune cell populations are present, how they each evolve or vary, and to map this onto time/disease-stage. These immune response maps, once created, can assist clinicians in improving health outcomes. We firstly propose a novel density-based clustering and cluster tracking algorithm, ChronoClust, for tracking the temporal changes of cell populations in time-series of discrete cytometry datasets. We conduct a comprehensive qualitative and quantitative evaluation of ChronoClust’s performance on: (i) a synthetic dataset capturing the characteristics of an immune response as observed through temporal cytometry data, and (ii) a real cytometry dataset elucidating the immune response development in the bone marrow of West Nile Virus (WNV)-infected mice (WNV dataset). Our results demonstrate the ability of ChronoClust to cluster cells into cell populations and track their evolutions in an unsupervised and automated manner. We then investigate the potential of dimensionality reduction techniques to ease the computational burden of clustering and tracking temporal cytometry data whilst minimally diminishing the clustering and cluster tracking performance. We explore 3 dimensionality reduction techniques in conjunction with ChronoClust. To obtain a broad sample of clustering performances, the full and reduced WNV datasets are independently clustered 400 times using400 unique ChronoClust hyperparameter value sets. We conclude that for large unwieldy datasets, dimensionality reduction can prove advantageous if the computational expense is otherwise prohibitive. Many clustering algorithms now exist for clustering cytometry data into discrete cell populations. Comparative algorithm evaluations on benchmark datasets rely on either a single performance metric, or a few metrics considered independently of one another. However, single metrics emphasise different aspects of clustering performance and do not rank clustering solutions in the same order. This underlies the lack of consensus between comparative studies regarding the optimal clustering algorithms. We propose ParetoBench, a Pareto fronts based framework as an integrative evaluation protocol, wherein individual performance metrics are leveraged as complementary perspectives, and a broad systematic sampling of algorithms’ hyperparameter values is used to reveal how meticulously must those algorithms be tuned to obtain good clustering performance. We exemplify the protocol by comparing 3 clustering algorithms (ChronoClust, FlowSOM and Phenograph) using 4 performance metrics applied across 4 cytometry datasets. Next, we present TrackSOM, a fast and effective clustering and cluster tracking algorithm which: (1) combines the excellent clustering quality and fast run time of FlowSOM with the tracking capability of ChronoClust, and (2) includes visualisation methods to assist in exploring the uncovered immune response dynamic. TrackSOM encompasses several modes of operation to suit a variety of experimental contexts, spanning users possessing exact know-ledge of how many cell phenotypes their data contains versus those who are engaged in unguided exploration. We demonstrate TrackSOM’s capacity on both synthetic and real-world datasets, provide usage advice to users, and exemplify novel discovery through its application. For our real-world use-case, we characterise the immune response to WNV infection in mice, uncovering heterogeneous sub-populations of immune cells and relating their functional evolution to disease severity. We perform a parameter sensitivity analysis and demonstrate TrackSOM to have both an improved performance and lower sensitivity to parameter value selections over ChronoClust. Importantly, TrackSOM verifies the robustness and generasibility of the cytometry-specific cluster tracking approach developed under ChronoClust. Finally, we propose SOMInsight, a novel technique which combines TrackSOM with other computational techniques to compute a set of temporally dynamic immune features that reveal the dynamics of cell populations and discriminate clinical outcomes. Our qualitative evaluation demonstrates SOMInsight’s ability to uncover cell population changes that are consistent with previous biological findings. Furthermore, it also discovers new biological insight which warrants further biological experiments to corroborate. All computational techniques are publicly available as open source software to support their use by the community and promote reproducible results.
See less
Date
2021Rights statement
The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.Faculty/School
Faculty of Engineering, School of Computer ScienceAwarding institution
The University of SydneyShare