Show simple item record

FieldValueLanguage
dc.contributor.authorDai, Xiang
dc.date.accessioned2021-06-23T04:03:30Z
dc.date.available2021-06-23T04:03:30Z
dc.date.issued2021en_AU
dc.identifier.urihttps://hdl.handle.net/2123/25482
dc.description.abstractThe growth rate in the amount of biomedical documents is staggering. Unlocking information trapped in these documents can enable researchers and practitioners to operate confidently in the information world. Biomedical Named Entity Recognition (NER), the task of recognising biomedical names, is usually employed as the first step of the NLP pipeline. Standard NER models, based on sequence tagging technique, are good at recognising short entity mentions in the generic domain. However, there are several open challenges of applying these models to recognise biomedical names: ● Biomedical names may contain complex inner structure (discontinuity and overlapping) which cannot be recognised using standard sequence tagging technique; ● The training of NER models usually requires large amount of labelled data, which are difficult to obtain in the biomedical domain; and, ● Commonly used language representation models are pre-trained on generic data; a domain shift therefore exists between these models and target biomedical data. To deal with these challenges, we explore several research directions and make the following contributions: (1) we propose a transition-based NER model which can recognise discontinuous mentions; (2) We develop a cost-effective approach that nominates the suitable pre-training data; and, (3) We design several data augmentation methods for NER. Our contributions have obvious practical implications, especially when new biomedical applications are needed. Our proposed data augmentation methods can help the NER model achieve decent performance, requiring only a small amount of labelled data. Our investigation regarding selecting pre-training data can improve the model by incorporating language representation models, which are pre-trained using in-domain data. Finally, our proposed transition-based NER model can further improve the performance by recognising discontinuous mentions.en_AU
dc.language.isoenen_AU
dc.subjectnatural language processingen_AU
dc.subjectinformation extractionen_AU
dc.subjectnamed entity recognitionen_AU
dc.subjectbiomedical NLPen_AU
dc.subjectdiscontinuous NERen_AU
dc.titleRecognising Biomedical Names: Challenges and Solutionsen_AU
dc.typeThesis
dc.type.thesisDoctor of Philosophyen_AU
dc.rights.otherThe author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.en_AU
usyd.facultySeS faculties schools::Faculty of Engineering::School of Computer Scienceen_AU
usyd.degreeDoctor of Philosophy Ph.D.en_AU
usyd.awardinginstThe University of Sydneyen_AU
usyd.advisorGudmundsson, Joachim


Show simple item record

Associated file/s

Associated collections

Show simple item record

There are no previous versions of the item available.