Show simple item record

FieldValueLanguage
dc.contributor.authorEstival, Dominique
dc.contributor.authorGaustad, Tanja
dc.contributor.authorHutchinson, Ben
dc.contributor.authorPham, Son Bao
dc.contributor.authorRadford, Will
dc.date.accessioned2010-02-08
dc.date.available2010-02-08
dc.date.issued2008-01-01
dc.identifier.urihttp://hdl.handle.net/2123/5839
dc.descriptionSubmitted for publication in 2008en_AU
dc.description.abstractThis paper reports on some aspects of a research project aimed at automating the analysis of texts for the purpose of author profiling and identification. The Text Attribution Tool (TAT) was developed for the purpose of language-independent author profiling and has now been trained on two email corpora, English and Arabic. The complete analysis provides probabilities for the author’s basic demographic traits (gender, age, geographic origin, level of education and native language) as well as for five psychometric traits. The prototype system also provides a probability of a match with other texts, whether from known or unknown authors. A very important part of the project was the data collection and we give an overview of the collection process as well as a detailed description of the corpus of email data which was collected. We describe the overall TAT system and its components before outlining the ways in which the email data is processed and analysed. Because Arabic presents particular challenges for NLP, this paper also describes more specifically the text processing components developed to handle Arabic emails. Finally, we describe the Machine Learning setup used to produce classifiers for the different author traits and we present the experimental results, which are promising for most traits examined.en_AU
dc.description.sponsorshipThe work presented in this paper was carried out while the authors were working at Appen Pty Ltd., Chatswood NSW 2067, Australiaen_AU
dc.language.isoen_AUen_AU
dc.subjectcomputational linguisticsen_AU
dc.subjecttext attributionen_AU
dc.subjectmachine learningen_AU
dc.subjectauthor profilingen_AU
dc.subjectnatural language processingen_AU
dc.subjectEnglishen_AU
dc.subjectArabicen_AU
dc.titleAuthor Profiling for English and Arabic Emailsen_AU
dc.typeArticleen_AU
usyd.departmentDepartment of Linguisticsen_AU


Show simple item record

Associated file/s

Associated collections

Show simple item record

There are no previous versions of the item available.