Show simple item record

FieldValueLanguage
dc.contributor.authorNguyen, Hoang Minh Dung
dc.date.accessioned2013-10-24
dc.date.available2013-10-24
dc.date.issued2013-03-31
dc.identifier.urihttp://hdl.handle.net/2123/9466
dc.description.abstractIn a noisy corpus such as in clinical data, the text usually contains a large number of misspell words, abbreviations and acronyms that can be an obstacle to high quality information extraction and classification. Furthermore, the gold-standard training data needed for supervised learning usually contains many errors and inconsistencies due to differences in human annotators. In this research, a specialised proof-reading process for the clinical domain to resolve unknown tokens and convert scores and measures into a standard layout is introduced. The automatic coding of the texts increased the coded content significantly after the automatic correction process. Accuracy of the automatic coding and annotation of the notes which have not been coded by the clinical staff is suggested by the system output. To deal with the problem of noisy training data, this thesis proposes an algorithm for a method named “reverse active learning” which means applying active learning in reverse order to improve performance of supervised machine learning on clinical corpora. The effects of automatic proof-reading and reverse active learning are shown to produce results on the i2b2 2010 clinical corpus that are a state-of-the-art of supervised learning method and offer a means of improving all processing strategies in clinical language processing. Finally, a Cancer Staging Information Extraction System based on the combination of proposed methods of proof-reading, supervised learning, active learning and reverse active learning is presented. In this research, free-text reports are annotated for examples of the information to be extracted and then algorithms are developed that use the examples to compute a more general model of the desired content. Besides traditional supervised learning methods such as Conditional Random Fields and Support Vector Machines, active learning approaches are investigated to bring further improvement to information extraction system performance.en_AU
dc.rightsThe author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.en_AU
dc.subjectinformation extractionen_AU
dc.subjectactive learningen_AU
dc.subjectmachine learningen_AU
dc.subjectclinicalen_AU
dc.subjectradiology reportsen_AU
dc.subjectcanceren_AU
dc.titleInformation Extraction from Radiology Reports for a Population Based Cancer Registryen_AU
dc.typeThesisen_AU
dc.type.thesisDoctor of Philosophyen_AU
usyd.facultyFaculty of Engineering and Information Technologies, School of Information Technologiesen_AU
usyd.departmentGraduate School of Engineering and Information Technologiesen_AU
usyd.degreeDoctor of Philosophy Ph.D.en_AU
usyd.awardinginstThe University of Sydneyen_AU


Show simple item record

Associated file/s

Associated collections

Show simple item record

There are no previous versions of the item available.