Health literacy prediction models based on machine learning methods: a scoping review

PAN Xiang; TONG Yingge; LI Yixuan; NI Ke; CHENG Wenqian; XIN Mengyu; HU Yuying

Journal of Preventive Medicine 2025;37(2):148-153

doi:10.19485/j.cnki.issn2096-5087.2025.02.009

Health literacy prediction models based on machine learning methods: a scoping review

PAN Xiang ; TONG Yingge ; LI Yixuan ; NI Ke ; CHENG Wenqian ; XIN Mengyu ; HU Yuying

Keywords

health literacy; prediction model; machine learning; scope review

Country

China

Language

Chinese

Abstract

Objective:To conduct a scoping review on the types, construction methods and predictive performance of health literacy prediction models based on machine learning methods, so as to provide the reference for the improvement and application of such models.

Methods:Publications on health literacy prediction models conducted using machine learning methods were retrieved from CNKI, Wanfang Data, VIP, PubMed and Web of Science from inception to May 1, 2024. The quality of literature was assessed using the Prediction Model Risk of Bias ASsessment Tool. Basic characteristics, modeling methods, data sources, missing value handling, predictors and predictive performance were reviewed.

Results:A total of 524 publications were retrieved, and 22 publications between 2007 and 2024 were finally enrolled. Totally 48 health literacy prediction models were involved, and 25 had a high risk of bias (52.08%), with major issues focusing on missing value handling, predictor selection and model evaluation methods. Modeling methods included regression models, tree-based machine learning methods, support vector machines and neural network models. Predictors primarily encompassed factors at four aspects: individual, interpersonal, organizational and society/policy aspects, with age, educational level, economic status, health status and internet use appearing frequently. Internal validation was conducted in 14 publications, and external validation was conducted in 4 publications. Forty-two models reported the areas under the receiver operating characteristic curve, which ranged from 0.52 to 0.983, indicating good discrimination.

Conclusion:Health literacy prediction models based on machine learning methods perform well, but have deficiencies in risk of bias, data processing and validation.

FULL TEXT LINKS

ACTIONS

Cite

Share