Show simple item record

FieldValueLanguage
dc.contributor.authorMiller, Justin
dc.date.accessioned2025-05-22T06:11:04Z
dc.date.available2025-05-22T06:11:04Z
dc.date.issued2025en_AU
dc.identifier.urihttps://hdl.handle.net/2123/33925
dc.description.abstractThis thesis addresses the challenges of clustering short text data, focusing on human interpretability and validation metrics. Employing Gaussian Mixture Models with embeddings from Large Language Models, this thesis demonstrates that these methods produce clusters that are more interpretable than traditional approaches. The thesis introduces the concept of multi-level clustering, an approach that examines how clusters form and evolve as the number of clusters in an algorithm increases. It also introduces a method to maximise the information conveyed in each cluster, while minimising the cognitive load required to understand the clusters. The findings bridge the gap between automated metrics and human evaluation, offering insights into optimal clustering techniques for short text. This is then used to examine human identity in Twitter bios and create visualisations that provide a better understanding of clusters, as well as employing linguistic methodology to identify key distinctions between the clusters.en_AU
dc.language.isoenen_AU
dc.subjectClusteringen_AU
dc.subjectLarge Language Modelsen_AU
dc.subjectShort Texten_AU
dc.titleShort Text Clustering with Large Language Modelsen_AU
dc.typeThesis
dc.type.thesisDoctor of Philosophyen_AU
dc.rights.otherThe author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.en_AU
usyd.facultySeS faculties schools::Faculty of Science::School of Physicsen_AU
usyd.departmentPhysicsen_AU
usyd.degreeDoctor of Philosophy Ph.D.en_AU
usyd.awardinginstThe University of Sydneyen_AU
usyd.advisorAlexander, Tristram


Show simple item record

Associated file/s

Associated collections

Show simple item record

There are no previous versions of the item available.