Show simple item record

FieldValueLanguage
dc.contributor.authorZou, Qunshengen_AU
dc.contributor.authorYang, Kuoen_AU
dc.contributor.authorShu, Zixinen_AU
dc.contributor.authorChang, Kaien_AU
dc.contributor.authorZheng, Qiguangen_AU
dc.contributor.authorZheng, Yien_AU
dc.contributor.authorLu, Kezhien_AU
dc.contributor.authorXu, Ningen_AU
dc.contributor.authorTian, Haoyuen_AU
dc.contributor.authorLi, Xiaomengen_AU
dc.contributor.authorYang, Yuxiaen_AU
dc.contributor.authorZhou, Yanaen_AU
dc.contributor.authorYu, Haibinen_AU
dc.contributor.authorZhang, Xiaopingen_AU
dc.contributor.authorXia, Jiananen_AU
dc.contributor.authorZhu, Qiangen_AU
dc.contributor.authorPoon, Josiahen_AU
dc.contributor.authorPoon, Simonen_AU
dc.contributor.authorZhang, Runshunen_AU
dc.contributor.authorLi, Xiaodongen_AU
dc.contributor.authorZhou, Xuezhongen_AU
dc.date.accessioned2022-04-28T02:45:03Z
dc.date.available2022-04-28T02:45:03Z
dc.date.issued2022
dc.identifier.urihttps://hdl.handle.net/2123/28321
dc.description.abstractBiomedical named entity recognition (BioNER) from clinical texts is a fundamental task for clinical data analysis due to the availability of large volume of electronic medical record data, which are mostly in free text format, in real-world clinical settings. Clinical text data incorporates significant phenotypic medical entities (e.g., symptoms, diseases, and laboratory indexes), which could be used for profiling the clinical characteristics of patients in specific disease conditions (e.g., Coronavirus Disease 2019 (COVID-19)). However, general BioNER approaches mostly rely on coarse-grained annotations of phenotypic entities in benchmark text dataset. Owing to the numerous negation expressions of phenotypic entities (e.g., "no fever," "no cough," and "no hypertension") in clinical texts, this could not feed the subsequent data analysis process with well-prepared structured clinical data. In this paper, we developed Human-machine Cooperative Phenotypic Spectrum Annotation System (http://www.tcmai.org/login, HCPSAS) and constructed a fine-grained Chinese clinical corpus. Thereafter, we proposed a phenotypic named entity recognizer: Phenonizer, which utilized BERT to capture character-level global contextual representation, extracted local contextual features combined with bidirectional long short-term memory, and finally obtained the optimal label sequences through conditional random field. The results on COVID-19 dataset show that Phenonizer outperforms those methods based on Word2Vec with an F1-score of 0.896. By comparing character embeddings from different data, it is found that character embeddings trained by clinical corpora can improve F-score by 0.0103. In addition, we evaluated Phenonizer on two kinds of granular datasets and proved that fine-grained dataset can boost methods' F1-score slightly by about 0.005. Furthermore, the fine-grained dataset enables methods to distinguish between negated symptoms and presented symptoms. Finally, we tested the generalization performance of Phenonizer, achieving a superior F1-score of 0.8389. In summary, together with fine-grained annotated benchmark dataset, Phenonizer proposes a feasible approach to effectively extract symptom information from Chinese clinical texts with acceptable performance.en_AU
dc.language.isoenen_AU
dc.subjectCOVID-19en_AUI
dc.subjectCoronavirusen_AUI
dc.titlePhenonizer: A Fine-Grained Phenotypic Named Entity Recognizer for Chinese Clinical Textsen_AU
dc.typeArticleen_AU
dc.identifier.doi10.1155/2022/3524090
dc.relation.otherMinistry of Science and Technology of the People's Republic of Chinaen_AU
dc.relation.otherNational Natural Science Foundation of Chinaen_AU
dc.relation.otherChina Academy of Chinese Medical Sciencesen_AU


Show simple item record

Associated file/s

There are no files associated with this item.

Associated collections

Show simple item record

There are no previous versions of the item available.