Phenonizer: A Fine-Grained Phenotypic Named Entity Recognizer for Chinese Clinical Texts

Zou, Qunsheng; Yang, Kuo; Shu, Zixin; Chang, Kai; Zheng, Qiguang; Zheng, Yi; Lu, Kezhi; Xu, Ning; Tian, Haoyu; Li, Xiaomeng; Yang, Yuxia; Zhou, Yana; Yu, Haibin; Zhang, Xiaoping; Xia, Jianan; Zhu, Qiang; Poon, Josiah; Poon, Simon; Zhang, Runshun; Li, Xiaodong; Zhou, Xuezhong

Field	Value	Language
dc.contributor.author	Zou, Qunsheng	en_AU
dc.contributor.author	Yang, Kuo	en_AU
dc.contributor.author	Shu, Zixin	en_AU
dc.contributor.author	Chang, Kai	en_AU
dc.contributor.author	Zheng, Qiguang	en_AU
dc.contributor.author	Zheng, Yi	en_AU
dc.contributor.author	Lu, Kezhi	en_AU
dc.contributor.author	Xu, Ning	en_AU
dc.contributor.author	Tian, Haoyu	en_AU
dc.contributor.author	Li, Xiaomeng	en_AU
dc.contributor.author	Yang, Yuxia	en_AU
dc.contributor.author	Zhou, Yana	en_AU
dc.contributor.author	Yu, Haibin	en_AU
dc.contributor.author	Zhang, Xiaoping	en_AU
dc.contributor.author	Xia, Jianan	en_AU
dc.contributor.author	Zhu, Qiang	en_AU
dc.contributor.author	Poon, Josiah	en_AU
dc.contributor.author	Poon, Simon	en_AU
dc.contributor.author	Zhang, Runshun	en_AU
dc.contributor.author	Li, Xiaodong	en_AU
dc.contributor.author	Zhou, Xuezhong	en_AU
dc.date.accessioned	2022-04-28T02:45:03Z
dc.date.available	2022-04-28T02:45:03Z
dc.date.issued	2022
dc.identifier.uri	https://hdl.handle.net/2123/28321
dc.description.abstract	Biomedical named entity recognition (BioNER) from clinical texts is a fundamental task for clinical data analysis due to the availability of large volume of electronic medical record data, which are mostly in free text format, in real-world clinical settings. Clinical text data incorporates significant phenotypic medical entities (e.g., symptoms, diseases, and laboratory indexes), which could be used for profiling the clinical characteristics of patients in specific disease conditions (e.g., Coronavirus Disease 2019 (COVID-19)). However, general BioNER approaches mostly rely on coarse-grained annotations of phenotypic entities in benchmark text dataset. Owing to the numerous negation expressions of phenotypic entities (e.g., "no fever," "no cough," and "no hypertension") in clinical texts, this could not feed the subsequent data analysis process with well-prepared structured clinical data. In this paper, we developed Human-machine Cooperative Phenotypic Spectrum Annotation System (http://www.tcmai.org/login, HCPSAS) and constructed a fine-grained Chinese clinical corpus. Thereafter, we proposed a phenotypic named entity recognizer: Phenonizer, which utilized BERT to capture character-level global contextual representation, extracted local contextual features combined with bidirectional long short-term memory, and finally obtained the optimal label sequences through conditional random field. The results on COVID-19 dataset show that Phenonizer outperforms those methods based on Word2Vec with an F1-score of 0.896. By comparing character embeddings from different data, it is found that character embeddings trained by clinical corpora can improve F-score by 0.0103. In addition, we evaluated Phenonizer on two kinds of granular datasets and proved that fine-grained dataset can boost methods' F1-score slightly by about 0.005. Furthermore, the fine-grained dataset enables methods to distinguish between negated symptoms and presented symptoms. Finally, we tested the generalization performance of Phenonizer, achieving a superior F1-score of 0.8389. In summary, together with fine-grained annotated benchmark dataset, Phenonizer proposes a feasible approach to effectively extract symptom information from Chinese clinical texts with acceptable performance.	en_AU
dc.language.iso	en	en_AU
dc.subject	COVID-19	en_AUI
dc.subject	Coronavirus	en_AUI
dc.title	Phenonizer: A Fine-Grained Phenotypic Named Entity Recognizer for Chinese Clinical Texts	en_AU
dc.type	Article	en_AU
dc.identifier.doi	10.1155/2022/3524090
dc.relation.other	Ministry of Science and Technology of the People's Republic of China	en_AU
dc.relation.other	National Natural Science Foundation of China	en_AU
dc.relation.other	China Academy of Chinese Medical Sciences	en_AU

Show simple item record

Associated file/s

There are no files associated with this item.

Associated collections

Research Publications and Outputs

Show simple item record

Phenonizer: A Fine-Grained Phenotypic Named Entity Recognizer for Chinese Clinical Texts

Associated file/s

Associated collections

Version history

Library

Phenonizer: A Fine-Grained Phenotypic Named Entity Recognizer for Chinese Clinical Texts

Associated file/s

Associated collections

Share

Version history

Filters

Library