Progress in method development and application of distributed learning for estimation of epidemiological effect
10.3760/cma.j.cn112338-20241018-00642
- VernacularTitle:分布式学习用于流行病学效应值估计的方法开发与应用进展
- Author:
Junting YANG
1
;
Xin GAO
;
Xiaoxuan WANG
;
Mengdi ZHANG
;
Xin CHEN
;
Yulin WANG
;
Zhike LIU
;
Siyan ZHAN
Author Information
1. 北京大学公共卫生学院流行病与卫生统计学系,北京 100191
- Publication Type:Journal Article
- Keywords:
Big data;
Multi-center;
Distributed learning;
Epidemiology;
Effect estimation
- From:
Chinese Journal of Epidemiology
2025;46(5):895-906
- CountryChina
- Language:Chinese
-
Abstract:
Objective:To systematically review the progress in the method development and application of distributed learning in the estimation of epidemiological effect and provide methodological reference for multi-center studies.Methods:We conducted a literature retrieval for English papers published up to December 31, 2023 by using keywords of "health/medical big data" and "distributed/federated learning". After consulting experts, we set criteria of paper inclusion and exclusion and created a framework for data extraction. We collected information about basic study details, including method, application, and evaluation. Two researchers independently screened the papers and extracted information. We used EndNote 20 for the management of literatures and EpiData for the management of data.Results:A total of 3 444 papers were collected, and 29 papers were included in the final analysis. Most of the papers (25, 86.2%) were published in or after 2019, and the papers were mainly from the United States (21/29, 72.4%). For the estimation of epidemiological effects, 22 distributed learning methods had been developed, including methods for logistic regression (8), Cox regression (8), Poisson regression (2), and generalized linear mixed model (GLMM) (4), as well as three platforms for distributed analysis (VLP, Vantage6, AusCAT). The 29 papers described 45 applications, with 20 (44.4%) focusing on the establishment of prediction model and 25 (55.6%) on association analysis. Importantly, except for GLMM, current distributed learning methods can estimate effects with little bias in 1-3 rounds of communication. These methods show less bias compared with meta-analysis, especially in the address of data heterogeneity and rare outcomes. However, less studies examined how differences in data structure and sparse data affect results, an area that requires further research.Conclusion:While distributed learning shows promise in epidemiological effect estimation, it is still in early development, requiring further research on data heterogeneity handling and communication efficiency improvement.