Chinese electronic medical record named entity recognition based on sentence-level lattice-long short-term memory neural network
10.16781/j.0258-879x.2019.05.0497
- Author:
Cui-Ran PAN
1
Author Information
1. Department of Medical Informatics, School of Medicine, Nantong University
- Publication Type:Journal Article
- Keywords:
Bi-directional long short-term memory neural network;
Computed medical records systems;
Conditional random field;
Electronic medical record;
Entity identification;
Lattice-long short-term memory neural network
- From:
Academic Journal of Second Military Medical University
2019;40(5):497-506
- CountryChina
- Language:Chinese
-
Abstract:
Objective To propose a conditional random field (CRF) model based on the new word segmentation method Re-entity, and to compare with bi-directional long short-term memory neural network (BiLSTM)-CRF and Lattice-long short-term memory neural network (LSTM). Methods After analyzing the existing entity recognition methods, we proposed CRF method based on Re-entity, BiLSTM-CRF and Lattice-LSTM for the China Conference on Knowledge Graph and Semantic Computing in 2018 (CCKS2018) task one: Chinese clinical named entity recognition, and trained character vector sets at different parameter levels based on different corpora. The comparative experiments on model performance were carried out in the different neural network models for each methods. Finally, the comparative study was carried out based on different input lengths such as the sentence level and the text level. Results Re-entity method can improve the performance of CRF model. Lattice-LSTM model based on sentence level achieved a strict F1-measure of 89.75% on this task, which was higher than the highest F1-measure (89.25%) on the task one of CCKS2018. Conclusion The CRF model based on Re-entity can effectively improve the recognition rate of traditional Chinese medicines in electronic medical records by using normalized Chinese clinical drug. Re-entity method can improve the error accumulation caused by word segmentation in data preprocessing. Lattice structure can better combine the latent semantic information of characters and word sequences. At the same time, sentence-level input can effectively improve the recognition accuracy of neural network models.