Semantic information retrieval based on the case report dataset of Adverse Drug Reactions Journal
10.3760/cma.j.cn114015-20230920-00691
- VernacularTitle:基于《药物不良反应杂志》病例报告数据集的语义信息检索研究
- Author:
Yayi XIAO
1
;
Yi LEI
;
Xin WANG
;
Xiangrong BAI
;
Qingxia ZHANG
;
Xiaolu FEI
Author Information
1. 首都医科大学宣武医院信息中心,北京 100053
- Publication Type:Journal Article
- Keywords:
Information storage and retrieval;
Case reports;
Database;
Semantic retrieval;
Keyword retrieval;
Deep learning
- From:
Adverse Drug Reactions Journal
2024;26(3):170-177
- CountryChina
- Language:Chinese
-
Abstract:
Objective:To explore the application value of semantic information retrieval (semantic retrieval) based on case reports dataset of Adverse Drug Reactions Journal. Methods:The dataset used in this study consists of 2 597 PDF files of case reports published on Adverse Drug Reactions Journal from 1999 to 2022. The semantic retrieval system is built by Baidu PaddlePaddle′s deep learning framework, the code was written in Python, and the text encoding model was Baidu RocketQA model. The precision at position k (P@k), recall at position k (R@k), mean reciprocal rank (MRR), mean average precision (MAP) and precision-recall (P-R) curve were used to evaluate the performance of semantic retrieval. The performance of semantic retrieval and keyword matching retrieval were compared by calculating the recall. Results:The set of preprocessed theme fields as items to be retrieved contained 2 597 documents, the set of search terms (queries) after removing deplicates and reorganizing included 1 388 drug name queries and 1 118 adverse reactions/events queries. The precision of drug name queries and adverse reactions/events queries by semantic retrieval were 0.667-1 and 0.566-1, and their recall were 0.667-0.871 and 0.566-0.863, respectively. The P-R curves of the top 1, 3, 5 and 10 documents in the semantic retrieval results using drug names queries and adverse reactions/events search terms showed that the precision decreased slowly in top 1 and 3 documents but significantly in top 5 and 10 documents with the increase of recall. The MRR of the 2 types of search terms were 0.854 and 0.871, and the MAP were 0.778 and 0.773, respectively. Using adverse reactions/events as search terms, semantic retrieval has a higher recall rate than keyword matching retrieval; using drug names as search terms, the recall rate of keyword matching retrieval is generally higher than that of semantic retrieval.Conclusions:The semantic retrieval system based on Baidu PaddlePaddle deep learning framework has good retrieval performance on the case reports dataset of Adverse Drug Reactions Journal. The semantic retrieval performs better with adverse reactions/events queries, while the keyword matching retrieval performs better with drug name queries.