Application of time series and machine learning models in predicting the trend of sickness absenteeism among primary and secondary school students in Shanghai
10.16835/j.cnki.1000-9817.2025082
- VernacularTitle:时间序列和机器学习模型在上海市中小学生因病缺课趋势预测中的应用
- Author:
WANG Zhengzhong, ZHANG Zhe, ZHOU Xinyi, YUAN Linlin, ZHAI Yani, SUN Lijing, LUO Chunyan
1
Author Information
1. Institute of Child and Adolescent Health, Shanghai Municipal Center for Disease Control and Prevention, Shanghai (200336) , China
- Publication Type:Journal Article
- Keywords:
Time;
Sequence analysis;
Disease;
Students
- From:
Chinese Journal of School Health
2025;46(3):426-430
- CountryChina
- Language:Chinese
-
Abstract:
Objective:To analyze the temporal variation patterns of sickness absenteeism among primary and secondary school students in Shanghai, so as to explore models suitable for predicting peaks and intensity of absenteeism rates.
Methods:The seasonal and trend decomposition using loess (STL) method was used to analyze the seasonal and long term trend changes in sickness absenteeism among primary and secondary school students from September 1 in 2010 to June 30 in 2018, in Shanghai. A hierarchical clustering method based on Dynamic Time Warping (DTW) was employed to classify absenteeism symptoms with similar temporal patterns. Based on historical data, the study constructed and evaluated different time series algorithms and machine learning models to optimize the accuracy of predicting the trend of sickness absenteeism.
Results:During the research period, the average new absenteeism rate due to illness was 16.86 per 10 000 person day for every academic year, and the trend of sickness absenteeism exhibited both seasonality and a long term upward trend, reaching its highest point in the 2017 academic year (22.47 per 10 000 person day). The symptoms of absenteeism were divided into three categories: high incidence in winter and spring (respiratory symptoms, fever and general discomfort, etc.), high incidence in summer (eye symptoms, nosebleeds, etc.) and those without obvious seasonality (skin symptoms, accidental injuries, etc.).The constructed time series models effectively predicted the trend of absenteeism due to illness, although the accuracy of predicting peak intensity was relatively low. Among them, the multi layer perceptron (MLP) model performed the best, with an root mean squared error (RMSE) of 8.96 and an mean absolute error (MAE) of 4.37, reducing 36.51% and 39.02% compared to the baseline model.
Conclusion:Time series models and machine learning algorithms could effectively predict the trend of sickness absenteeism, and corresponding prevention and control measures can be taken for absenteeism caused by different symptoms during peak periods.