FB*-Index: A Compact Index for Efficient and Exact Density-based Clustering
Access status:
USyd Access
Type
ThesisThesis type
Masters by ResearchAuthor/s
Zhao, BideAbstract
Density-based clustering is a fundamental technique for discovering arbitrarily shaped clusters and handling noise, without requiring the number of clusters to be specified in advance. However, existing methods often struggle with efficiency and accuracy across varying query ...
See moreDensity-based clustering is a fundamental technique for discovering arbitrarily shaped clusters and handling noise, without requiring the number of clusters to be specified in advance. However, existing methods often struggle with efficiency and accuracy across varying query parameters $\varepsilon$ and $\mu$. In this thesis, we propose a novel index-based algorithm for efficient and exact cluster extraction. We introduce FB, the first linear-size index that supports exact clustering with running time linear in the output size for any query $\varepsilon$, along with a compact variant, FB$^*$, for efficiently extracting density-based clusters. Due to the compactness of the index and the efficiency of the query algorithm, our method is well-suited for disk-based storage, enabling multiple versions of the index to support arbitrary $(\varepsilon,\mu)$ queries. We provide formal analyses of time and space complexity. Extensive experiments on 19 real-world datasets demonstrate that our method significantly outperforms existing approaches while guaranteeing exact clustering results.
See less
See moreDensity-based clustering is a fundamental technique for discovering arbitrarily shaped clusters and handling noise, without requiring the number of clusters to be specified in advance. However, existing methods often struggle with efficiency and accuracy across varying query parameters $\varepsilon$ and $\mu$. In this thesis, we propose a novel index-based algorithm for efficient and exact cluster extraction. We introduce FB, the first linear-size index that supports exact clustering with running time linear in the output size for any query $\varepsilon$, along with a compact variant, FB$^*$, for efficiently extracting density-based clusters. Due to the compactness of the index and the efficiency of the query algorithm, our method is well-suited for disk-based storage, enabling multiple versions of the index to support arbitrary $(\varepsilon,\mu)$ queries. We provide formal analyses of time and space complexity. Extensive experiments on 19 real-world datasets demonstrate that our method significantly outperforms existing approaches while guaranteeing exact clustering results.
See less
Date
2025Rights statement
The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.Faculty/School
Faculty of EngineeringAwarding institution
The University of SydneyShare