Comparative analysis of subject novelty detection methods in medical literature
10.3969/j.issn.1671-3982.2018.02.004
- VernacularTitle:医学文献主题新颖性探测方法对比分析
- Author:
Si-Si CHEN
1
;
Li-Ping DONG
;
Dan XU
;
Ji-Jun GUO
Author Information
1. 中国医科大学图书馆
- Keywords:
Subject of literature;
Novelty detection;
ROC curve;
Feasibility analysis
- From:Chinese Journal of Medical Library and Information Science
2018;27(2):20-25
- CountryChina
- Language:Chinese
-
Abstract:
Objective To study the feasibility of novelty detection model in assessing the subject novelty of medical literature and comparatively analyze the advantages and disadvantages of words-overlap algorithm and co-words-based inverse file frequency quantitative algorithm. Methods Two novelty detection models were established for the 8 research subjects in PubMed-covered literature. The feasibility of two novelty detection models in assessing the subject novelty of medical literature was assessed according to the subject novelty of literature analyzed by experts, ROC curves and AUC values. Results Words-overlap algorithm showed that the fluctuating amplitude of subject novelty was rather high, which can thus reflect the difference between the contents in literature on the data. ROC curves and AUC values-based analysis revealed a high accuracy of words-overlap algorithm for judging the novelty of literature while co-words-based inverse file frequency quantitative algorithm displayed a low accuracy for judging the novelty of literature. Conclusion The novelty of literature detected with the two novelty detection methods is moderately related. The mean novelty value detected with the two novelty detection methods is of statistical signifi-cance. However, the novelty of literature detected with words-overlap algorithm is higher than that detected with co-words-based inverse file frequency quantitative algorithm.