Factors affecting and identification of key environmental determinants of the Oncomelania hupensis snail density in the Yangtze River Delta based on machine learning models
10.16250/j.32.1915.2025252
- VernacularTitle:基于机器学习的长江三角洲地区钉螺密度影响因素 分析及关键环境因子识别
- Author:
Yinlong LI
1
;
Qin LI
1
;
Suying GUO
1
;
Shizhen LI
1
;
Lijuan ZHANG
1
;
Chunli CAO
1
;
Jing XU
2
Author Information
1. National Institute of Parasitic Diseases, Chinese Center for Disease Control and Prevention (Chinese Center for Tropical Diseases Research), National Health Commission Key Laboratory on Parasite and Vector Biology, WHO Collaborating Centre for Tropical Diseases, National Center for International Research on Tropical Diseases, Ministry of Science and Technology, Shanghai 200025, China
2. National Institute of Parasitic Diseases, Chinese Center for Disease Control and Prevention (Chinese Center for Tropical Diseases Research), National Health Commission Key Laboratory on Parasite and Vector Biology, WHO Collaborating Centre for Tropical Diseases, National Center for International Research on Tropical Diseases, Ministry of Science and Technology, Shanghai 200025, China; School of Global Health, Shanghai Jiao Tong University School of Medicine and Chinese Center for Tropical Diseases Research, Shanghai 200025, China
- Publication Type:Journal Article
- Keywords:
Oncomelania snail;
Density;
Influencing factor;
Environmental factor;
Machine learning model;
XGBoost model;
Yangtze River Delta
- From:
Chinese Journal of Schistosomiasis Control
2026;38(1):14-19
- CountryChina
- Language:Chinese
-
Abstract:
Objective To identify factors affecting and key environmental factors of the Oncomelania hupensis snail density in the Yangtze River Delta region using machine learning methods. Methods Administrative village-level O. hupensis snail survey data in the Yangtze River Delta (including Shanghai Municipality, Jiangsu Province, Zhejiang Province and Anhui Province) from 2011 to 2021 were retrieved from the Information Management System for Parasitic Disease Control of Chinese Center for Disease Control and Prevention. Environmental factor data were captured from the Google Earth Engine platform, including elevation, slope, terrain, normalized difference vegetation index (NDVI), vegetation type, soil type, total petroleum hydrocarbon (TPH), ammonium nitrogen, inorganic nitrogen, dissolved oxygen, pH of water, chemical oxygen demand (COD) and inorganic phosphorus, and climatic factor data in the study region were retrieved from the Copernicus Climate Data Store, including annual precipitation, aridity index and annual mean temperature (AMT). O. hupensis snail survey data in the Yangtze River Delta region from 2011 to 2021 were randomly divided into a training set (70%) and a test set (30%), and five machine learning models were selected for machine learning model construction and comparative analysis of the O. hupensis snail density using the software R 4.3.0, including random forest (RF), eXtreme gradient boosting (XGBoost), support vector machine (SVM), gradient boosting machine (GBM) and neural network (NN). The XGBoost model was employed to construct a predictive model for the O. hupensis snail density, and the impact of each environmental factor on O. hupensis snail distribution was quantified. The SHapley Additive exPlanations (SHAPs) values were calculated to estimate the average contribution of each variable to the model prediction, and the core environmental factors affecting the O. hupensis snail population density were screened. Results Among the five machine learning models, the XGBoost model exhibited the optimal comprehensive performance, with the coefficient of determination (R2) of 0.855, mean squared error (MSE) of 0.188, root mean squared error (RMSE) of 0.434 and mean absolute error (MAE) of 0.155, respectively. Analysis of factors affecting the O. hupensis snail density with the XGBoost model showed that among the 16 environmental factors, the top four high-impact factors ranked by SHAPs values included annual precipitation, elevation, aridity index and NDVI, with cumulative SHAPs contributions of 75%, which was higher than that of other environmental factors. If NDVI was higher than 0.6, the O. hupensis snail density increased with NDVI and peaked if NDVI was 0.8 (1.60 snails/0.1 m2). The O. hupensis snail density increased with elevation if the elevation ranged from 14 to 40 m, and slowly rose if the annual precipitation ranged from 900 to 1 300 mm, and then increased rapidly to the peak (1.52 snails/0.1 m2) if the annual precipitation ranged from 1 300 to 1 500 mm. In addition, the O. hupensis snail density increased rapidly to the maximum (1.60 snails/0.1 m2) if the aridity index ranged from 0.8 to 1.1, and decreased gradually if the aridity index exceeded 1.1. Conclusions The XGBoost model shows excellent performance in prediction of the O. hupensis snail density and identification of key environmental factors in the Yangtze River Delta region. Annual precipitation, elevation, aridity index and NDVI are key environmental factors affecting the distribution and density of O. hupensis snails in the Yangtze River Delta region.