Application of Bayesian probabilistic linkage model in birth and death data linking
10.19428/j.cnki.sjpm.2024.23137
- VernacularTitle:贝叶斯概率链接模型在出生和死亡数据链接中的应用
- Author:
Huiting YU
1
;
Renzhi CAI
1
;
Weixiao LIN
1
;
Jingyi NI
2
;
Naisi QIAN
1
;
Tian XIA
1
;
Fan WU
2
Author Information
1. Department of Health Information, Shanghai Municipal Center for Disease Control and Prevention, Shanghai 200336, China
2. School of Public Health, Fudan University, Shanghai 200032, China
- Publication Type:Journal Article
- Keywords:
multi-source data;
Bayesian probabilistic linkage model;
Jaro-Winkler algorithm;
confusion matrix
- From:
Shanghai Journal of Preventive Medicine
2024;36(1):98-103
- CountryChina
- Language:Chinese
-
Abstract:
ObjectiveTo elucidate the principles and methods of the Bayesian probabilistic linkage model, and to demonstrate the effect of applying the model in linking birth and death data. MethodsThrough the Shanghai birth and death registration system, data of 199 025 infants born in 2017 and 1 512 infants who died in 2017 and 2018 were collected. After cleaning the data, the data were divided into monthly blocks and fully linked. The Jaro-Winkler algorithm and Euclidean distance were employed to measure the similarity of fields for matching. A Bayesian probabilistic linkage model was constructed and the linking effect was evaluated using a confusion matrix. ResultsUsing the Bayesian probabilistic linkage model, the birth and death data of infants were effectively linked, revealing that 36.71% of infants who died in Shanghai were born outside the city, and the probability of infant death was 2.6‰. The confusion matrix of the test set showed a recall rate of 0.86, precision of 0.76, and an F-score of 0.81. ConclusionThe practical application of Bayesian probabilistic linkage demonstrates a good model performance, enabling the establishment of birth-death cohorts that more accurately reflect the true levels of infant mortality. Utilizing this technique to integrate data from different departments can effectively improve research efficiency in the field of public health.