Please use this identifier to cite or link to this item:
|Title:||Author Profiling for English and Arabic Emails|
Pham, Son Bao
Department of Linguistics
natural language processing
|Abstract:||This paper reports on some aspects of a research project aimed at automating the analysis of texts for the purpose of author profiling and identification. The Text Attribution Tool (TAT) was developed for the purpose of language-independent author profiling and has now been trained on two email corpora, English and Arabic. The complete analysis provides probabilities for the author’s basic demographic traits (gender, age, geographic origin, level of education and native language) as well as for five psychometric traits. The prototype system also provides a probability of a match with other texts, whether from known or unknown authors. A very important part of the project was the data collection and we give an overview of the collection process as well as a detailed description of the corpus of email data which was collected. We describe the overall TAT system and its components before outlining the ways in which the email data is processed and analysed. Because Arabic presents particular challenges for NLP, this paper also describes more specifically the text processing components developed to handle Arabic emails. Finally, we describe the Machine Learning setup used to produce classifiers for the different author traits and we present the experimental results, which are promising for most traits examined.|
|Description:||Submitted for publication in 2008|
|Department/Unit/Centre:||Department of Linguistics|
|Type of Work:||Article|
|Appears in Collections:||Online Publications|
This work is protected by Copyright. All rights reserved. Access to this work is provided for the purposes of personal research and study. Except where permitted under the Copyright Act 1968, this work must not be copied or communicated to others without the express permission of the copyright owner. Use the persistent URI in this record to enable others to access this work.
|TAT-NLE-Paper.pdf||Main article||212.32 kB||Adobe PDF||View/Open|
|TAT-NLE-Figures.pdf||Figures||143.52 kB||Adobe PDF||View/Open|
Items in Sydney eScholarship Repository are protected by copyright, with all rights reserved, unless otherwise indicated.