1.Time series study on influence of sulfur dioxide exposure on hospitalization of chronic obstructive pulmonary disease in Lanzhou from 2016 to 2020
Sheng LIN ; Boxi FENG ; Yongyue LI ; Yiwei HUANG ; Kai ZHENG ; Mingxuan LIU ; Yingying YANG ; Xingmin WEI ; Jianjun WU
Journal of Environmental and Occupational Medicine 2026;43(4):451-457
Background In 2021, chronic obstructive pulmonary disease (COPD) emerged as the forth leading cause of death in the world. However, the impact of air pollutants on COPD is still inconsistent across current studies. Objective To analyze the relationship between ambient sulfur dioxide (SO2) exposure and hospital admissions for COPD in Lanzhou, and to examine the modified effects of SO2 across different genders, age groups, and seasons. Methods A total of
2.Spicy food consumption and risk of vascular disease: Evidence from a large-scale Chinese prospective cohort of 0.5 million people.
Dongfang YOU ; Dianjianyi SUN ; Ziyu ZHAO ; Mingyu SONG ; Lulu PAN ; Yaqian WU ; Yingdan TANG ; Mengyi LU ; Fang SHAO ; Sipeng SHEN ; Jianling BAI ; Honggang YI ; Ruyang ZHANG ; Yongyue WEI ; Hongxia MA ; Hongyang XU ; Canqing YU ; Jun LV ; Pei PEI ; Ling YANG ; Yiping CHEN ; Zhengming CHEN ; Hongbing SHEN ; Feng CHEN ; Yang ZHAO ; Liming LI
Chinese Medical Journal 2025;138(14):1696-1704
BACKGROUND:
Spicy food consumption has been reported to be inversely associated with mortality from multiple diseases. However, the effect of spicy food intake on the incidence of vascular diseases in the Chinese population remains unclear. This study was conducted to explore this association.
METHODS:
This study was performed using the large-scale China Kadoorie Biobank (CKB) prospective cohort of 486,335 participants. The primary outcomes were vascular disease, ischemic heart disease (IHD), major coronary events (MCEs), cerebrovascular disease, stroke, and non-stroke cerebrovascular disease. A Cox proportional hazards regression model was used to assess the association between spicy food consumption and incident vascular diseases. Subgroup analysis was also performed to evaluate the heterogeneity of the association between spicy food consumption and the risk of vascular disease stratified by several basic characteristics. In addition, the joint effects of spicy food consumption and the healthy lifestyle score on the risk of vascular disease were also evaluated, and sensitivity analyses were performed to assess the reliability of the association results.
RESULTS:
During a median follow-up time of 12.1 years, a total of 136,125 patients with vascular disease, 46,689 patients with IHD, 10,097 patients with MCEs, 80,114 patients with cerebrovascular disease, 56,726 patients with stroke, and 40,098 patients with non-stroke cerebrovascular disease were identified. Participants who consumed spicy food 1-2 days/week (hazard ratio [HR] = 0.95, 95% confidence interval [95% CI] = [0.93, 0.97], P <0.001), 3-5 days/week (HR = 0.96, 95% CI = [0.94, 0.99], P = 0.003), and 6-7 days/week (HR = 0.97, 95% CI = [0.95, 0.99], P = 0.002) had a significantly lower risk of vascular disease than those who consumed spicy food less than once a week ( Ptrend <0.001), especially in those who were younger and living in rural areas. Notably, the disease-based subgroup analysis indicated that the inverse associations remained in IHD ( Ptrend = 0.011) and MCEs ( Ptrend = 0.002) risk. Intriguingly, there was an interaction effect between spicy food consumption and the healthy lifestyle score on the risk of IHD ( Pinteraction = 0.037).
CONCLUSIONS
Our findings support an inverse association between spicy food consumption and vascular disease in the Chinese population, which may provide additional dietary guidance for the prevention of vascular diseases.
Humans
;
Male
;
Female
;
Prospective Studies
;
Middle Aged
;
Aged
;
Vascular Diseases/etiology*
;
Risk Factors
;
China/epidemiology*
;
Adult
;
Proportional Hazards Models
;
Cerebrovascular Disorders/epidemiology*
;
East Asian People
3.Statistical methods for extremely unbalanced data in genome-wide association study (2)
Ning XIE ; Wenjian BI ; Zhongwen ZHANG ; Fang SHAO ; Yongyue WEI ; Yang ZHAO ; Ruyang ZHANG ; Feng CHEN
Chinese Journal of Epidemiology 2025;46(1):147-153
Extremely unbalanced data refers to datasets with independent or dependent variables showing severe imbalances in proportions, which might lead to deviation of classical test statistics from theoretical distribution and difficulties in controlling type Ⅰ error. The increased availability of genome-wide resources from large population cohorts has highlighted the growing demand for efficient and accurate statistical methods for the process of extremely unbalanced data to improve the development of genetic statistical methods. This paper introduces two widely used correction methods in current genome-wide association study for extremely unbalanced data, i.e. Firth correction and saddle point approximation, describes their effectiveness in controlling type Ⅰ errors confirmed by simulation experiments, finally, and summarizes the commonly used software for extremely unbalanced genomic data to provide theoretical reference and suggestion for its application for the statistical analysis on extremely unbalanced data in future.
4.Advances in the application of machine learning-related combined models in infectious disease prediction
Weihua HU ; Huimin SUN ; Yikun CHANG ; Jinwei CHEN ; Zhicheng DU ; Yongyue WEI ; Yuantao HAO
Chinese Journal of Epidemiology 2025;46(6):1085-1094
When the epidemiology of infectious diseases is more complex, it is often difficult for disease prediction studies based on a single model to capture the multidimensional nature of disease transmission. In recent years, combining different models to improve infectious disease prediction has gradually become a research trend and hotspot. Existing studies have shown that combined models usually have higher prediction performance and better generalization ability. The current combined models mainly combine machine learning and other models, including time-series models, dynamic models, etcetera. In addition, integrated learning that combines diverse machine learning techniques also holds significant importance across various research domains. This paper reviews the progress of applying combined models around machine learning in infectious disease prediction to promote the innovation and practice of combined models for infectious diseases and help to build smarter and more efficient infectious disease early warning and prediction methods and systems.
5.Progress in application of compartment model-related combined models in infectious disease prediction
Weihua HU ; Huimin SUN ; Yikun CHANG ; Jinwei CHEN ; Zhicheng DU ; Yongyue WEI ; Yuantao HAO
Chinese Journal of Epidemiology 2025;46(7):1289-1296
Methods such as compartmental models, agent-based models, time series models, and machine learning can be used for the prediction of infectious disease incidence. When disease epidemics are complex, it is often difficult to use a single model to comprehensively and accurately capture the multi dimensional nature of the disease. Exploring the combined application of different models has gradually become a research trend and hotspot in recent years, and the prediction performance of combined models is often better than that of single ones. Current research related to combined models mainly focus on machine learning or compartmental models. In this review, we focus on the combination of compartmental models and other models, and summarize their combination principles, application progress, and advantages or disadvantages for the purpose of promoting the innovation and application of combined models for infectious disease incidence prediction, and establishing a more intelligent and efficient early warning and prediction method or systems for the prevention and control of infectious disease.
6.Statistical methods for extremely unbalanced data in genome-wide association study (2)
Ning XIE ; Wenjian BI ; Zhongwen ZHANG ; Fang SHAO ; Yongyue WEI ; Yang ZHAO ; Ruyang ZHANG ; Feng CHEN
Chinese Journal of Epidemiology 2025;46(1):147-153
Extremely unbalanced data refers to datasets with independent or dependent variables showing severe imbalances in proportions, which might lead to deviation of classical test statistics from theoretical distribution and difficulties in controlling type Ⅰ error. The increased availability of genome-wide resources from large population cohorts has highlighted the growing demand for efficient and accurate statistical methods for the process of extremely unbalanced data to improve the development of genetic statistical methods. This paper introduces two widely used correction methods in current genome-wide association study for extremely unbalanced data, i.e. Firth correction and saddle point approximation, describes their effectiveness in controlling type Ⅰ errors confirmed by simulation experiments, finally, and summarizes the commonly used software for extremely unbalanced genomic data to provide theoretical reference and suggestion for its application for the statistical analysis on extremely unbalanced data in future.
7.Advances in the application of machine learning-related combined models in infectious disease prediction
Weihua HU ; Huimin SUN ; Yikun CHANG ; Jinwei CHEN ; Zhicheng DU ; Yongyue WEI ; Yuantao HAO
Chinese Journal of Epidemiology 2025;46(6):1085-1094
When the epidemiology of infectious diseases is more complex, it is often difficult for disease prediction studies based on a single model to capture the multidimensional nature of disease transmission. In recent years, combining different models to improve infectious disease prediction has gradually become a research trend and hotspot. Existing studies have shown that combined models usually have higher prediction performance and better generalization ability. The current combined models mainly combine machine learning and other models, including time-series models, dynamic models, etcetera. In addition, integrated learning that combines diverse machine learning techniques also holds significant importance across various research domains. This paper reviews the progress of applying combined models around machine learning in infectious disease prediction to promote the innovation and practice of combined models for infectious diseases and help to build smarter and more efficient infectious disease early warning and prediction methods and systems.
8.Progress in application of compartment model-related combined models in infectious disease prediction
Weihua HU ; Huimin SUN ; Yikun CHANG ; Jinwei CHEN ; Zhicheng DU ; Yongyue WEI ; Yuantao HAO
Chinese Journal of Epidemiology 2025;46(7):1289-1296
Methods such as compartmental models, agent-based models, time series models, and machine learning can be used for the prediction of infectious disease incidence. When disease epidemics are complex, it is often difficult to use a single model to comprehensively and accurately capture the multi dimensional nature of the disease. Exploring the combined application of different models has gradually become a research trend and hotspot in recent years, and the prediction performance of combined models is often better than that of single ones. Current research related to combined models mainly focus on machine learning or compartmental models. In this review, we focus on the combination of compartmental models and other models, and summarize their combination principles, application progress, and advantages or disadvantages for the purpose of promoting the innovation and application of combined models for infectious disease incidence prediction, and establishing a more intelligent and efficient early warning and prediction method or systems for the prevention and control of infectious disease.
9.Contribution of the large-scale population cohort in disease risk prediction model study: taking United Kingdom Biobank as an example
Chenxu ZHU ; Yuxin SONG ; Yuantao HAO ; Feng CHEN ; Yongyue WEI
Chinese Journal of Epidemiology 2024;45(10):1433-1440
The disease risk prediction model is the basis of precision prevention and an essential reference for clinical treatment decisions. The development of risk prediction models requires the support of a large amount of high-quality data. A large population cohort study is an important basis for this study. The United Kingdom Biobank (UKB), as a mega-population cohort and biobank, has played an essential role in the exploration of disease etiology and research related to disease prevention and control, with its rich baseline and follow-up data and concepts and mechanisms shared globally. This study followed PRISMA guidelines and included 210 articles with corresponding authors from 18 countries, of which 58 (27.62%) were from the UKB. A total of 491 disease risk prediction models were extracted for cancer, cardiovascular and cerebrovascular diseases, endocrine and metabolic diseases, respiratory diseases, and other diseases and their subgroups, of which 132 were developed by UKB without validation, 183 were developed by UKB with internal validation, 17 were developed by UKB with external validation, and 159 were developed by external development with UKB validation. A total of 188 models used only macro variables (38.29%), and 303 models combined macro and micro variables (61.71%). Model construction methods included survival outcome models, logistic regression, and machine learning. Survival outcome models were dominated by Cox proportional risk regression models and a few models considering competitive risk, accelerated failure models, or different baseline risk functions. Machine learning models included random forest, XGBoost, CatBoost, support vector machine, convolutional neural network, and other methods. The UKB is an essential resource for multiple disease risk prediction modeling studies.
10.Statistical methods for extremely unbalanced data in genome-wide association study (1)
Ning XIE ; Wenjian BI ; Zhongwen ZHANG ; Fang SHAO ; Yongyue WEI ; Yang ZHAO ; Ruyang ZHANG ; Feng CHEN
Chinese Journal of Epidemiology 2024;45(11):1582-1589
Extremely unbalanced data here refers to datasets where the values of independent or dependent variables exhibit severe unbalance in proportions, such as extremely unbalanced case-control ratio, very low incidence rate of disease, heavily censored time-to-event data, and low-frequency or rare variants. In such scenarios, the statistic derived from hypothesis test using the classical statistical method, e.g., logistic regression model and Cox proportional hazard regression model, might deviate from theoretical asymptotic distribution, resulting in inflation or deflation of type I error. With the increased availability and exploration of resources from large-scale population cohorts in genome-wide association study (GWAS), there is a growing demand for effective and accurate statistical approaches to handle extremely unbalanced data in independent and non-independent samples. Our study introduces classical statistical methods in genetic statistics firstly, then, summarizes the failure of classical statistical methods in dealing with extremely unbalanced data through simulation experiments to draw researchers' attention to the extremely unbalanced data in GWAS.

Result Analysis
Print
Save
E-mail