Study on Automatic Word Segmentation for Traditional Chinese Medical Record Literature
10.3969/j.issn.1005-5304.2015.02.012
- VernacularTitle:中医医案文献自动分词研究
- Author:
Fan ZHANG
;
Xiaofeng LIU
;
Yan SUN
- Publication Type:Journal Article
- Keywords:
traditional Chinese medical record literature;
automatic word segmentation;
dictionary of traditional Chinese medicine;
Hierarchical Hidden Markov Model;
part-of-speech tagging
- From:
Chinese Journal of Information on Traditional Chinese Medicine
2015;(2):38-41
- CountryChina
- Language:Chinese
-
Abstract:
Objective To study the automatic word segmentation scheme suitable for traditional Chinese medical record literature. Methods Hierarchical Hidden Markov Model was used as segmentation model. Totally 300 ancient medical record literature and 300 modern medical record literature were set as experimental subjects to establish the dictionary of traditional Chinese medicine and the test corpus, with a purpose to segment the words and evaluate of the results. Results Without using dictionary of traditional Chinese medicine, the word segmentation accuracy of two kinds of medical record literature was about 75%;the part-of-speech tagging accuracy of ancient medical literature was 56.74%, the modern medical literature accuracy was 64.81%. By using dictionary of tradition Chinese medicine, the word segmentation accuracy of ancient medical record literature was 90.73%, the modern medical record literature accuracy was 95.66%;the part-of-speech tagging accuracy of ancient medical record literature was 78.47%, the modern medical record literature accuracy was 91.45%, which was obviously higher than that of ancient medical record literature. Conclusion The current word segmentation scheme has initially solved the problem of word segmentation of traditional Chinese medical record literature and part-of-speech tagging of modern medical record literature. Part of speech tagging is basically correct, but part-of-speech tagging of ancient medical record literature tagging needs further study for many influencing factors.