1.Exploration of DRG Optimized Grouping of AIDS Patients based on Decision Tree Model
Pingping LU ; Yumei LI ; Junxia WU
Chinese Journal of Health Statistics 2025;42(3):340-343
Objective To explore the optimal grouping scheme of diagnosis related groups (DRG) for acquired immunodeficiency syndrome(AIDS) patients and develop cost criteria to provide a reference for implementing DRG payment reform in this region. Methods Information on the first pages of 1987 cases of AIDS patients from Nantong AIDS designated hospitals between 2018 and 2022 was collected, the influencing factors of hospitalization costs were analyzed by applying univariate and multiple linear regression and screening out the classification nodes, and the DRG grouping scheme was constructed by using a decision tree model. Results Complications or comorbidities, number of other diagnoses, and case type were used as classification nodes to form a total of 6 case combinations and corresponding hospitalization cost criteria. The difference in hospitalization cost between groups was statistically significant (P<0. 001), the reduction in variation(RIV) value was 51. 0%, and the coefficient of variation(CV) value of each group was less than 1 (0. 33~0. 63), with good inter-group heterogeneity and intra-group homogeneity. Conclusion The grouping scheme constructed based on the decision tree model is more reasonable, and the standard cost can objectively reflect the actual level of medical consumption of patients, providing a reference for improving DRG grouping and cost payment for AIDS patients in the region.
2.Application of ARIMA Model based on Empirical Mode Decomposition in Pulmonary Tuberculosis Prediction in Shanxi Province
Jing LIU ; Ruiqing ZHAO ; Zhiyang ZHAO
Chinese Journal of Health Statistics 2025;42(2):175-179
Objective To explore the prediction performance of the autoregressive summation moving average(ARIMA)model based on empirical mode decomposition(EMD)for the prevalence trend of tuberculosis,to provide method support for the prediction of tuberculosis,and to provide ideas for the prediction of other infectious diseases.Methods The monthly data of pulmonary tuberculosis incidence in Shanxi Province from January 2008 to December 2018 were collected and sorted.The last three months,six months,nine months and one year of the data were used as the test set to evaluate the model prediction effect,and the training set was the remaining data of the corresponding sequence.The EMD-ARIMA model was constructed to predict and compared with the single ARIMA model.Results The predicted errors of EMD-ARIMA model for the next three months,six months,nine months and one year were all smaller than the errors of ARIMA model.Conclusion Compared with single ARIMA model,EMD-ARIMA model can improve the prediction accuracy and predict the incidence of pulmonary tuberculosis,and provide effective theoretical reference for disease control and prevention.
3.Development and Application of Caregiver Burden Scale for Primary Family Caregivers of Cancer Patients
Hanqi LI ; Donglan ZHENG ; Liuhui WANG
Chinese Journal of Health Statistics 2025;42(2):180-184
Objective The study intends to develop a multi-dimensional specific scale for the primary family caregivers of cancer patients,verify its reliability and validity,and reveal the burden level of the primary family caregivers of cancer patients.Methods This study investigated the primary family caregivers of cancer patients by questionnaire survey in Harbin of two medical institutions,finally we recalled 571 valid questionnaires.The study used SPSS 27.0 and AMOS 28.0 software to conduct reliability and validity analysis.The critical value of the burden score was determined by the quartile method to evaluate the burden level.Results The revised scale includes 21 items in 5 dimensions.The Cronbach's α coefficient range of the total scale and each dimension of the scale was 0.798~0.922,the Omega coefficient range was 0.806~0.918,and the split-half reliability range was 0.753~0.862.The average variance extracted(AVE)value and composite reliability(CR)value of the five dimensions of the scale meet the criteria,and the correlation coefficient between each dimension is less than the square root of AVE.Based on the critical value,there are 54.8%of carers suffer from moderate or heavy burdens.Among them,the financial burden and the care burden have the highest scores.Conclusion It shows that the burden scale for primary family caregivers of cancer patients has good reliability and validity.There are more than half of carers suffer from relatively heavy burden.Policymakers should draw up policies to support economic medical expense reimbursement and last comfort service.
4.Application and Interpretability of the Unbalanced Ensemble Algorithm LASSO-EasyEnsemble in Prognostic Prediction of Coronary Heart Disease
Jiaxin ZAN ; Hong YANG ; Jing TIAN
Chinese Journal of Health Statistics 2025;42(2):197-203
Objective In light of the high noise and inter-class imbalance encountered in the prognosis prediction of coronary heart disease,this study aims to construct an EasyEnsemble imbalanced ensemble model after LASSO feature selection and evaluate its performance.Methods Based on survey data from the National Health and Nutrition Examination Survey public database for the years 2009-2018,with follow-up data until 2019,this study aimed to predict the prognosis of coronary heart disease based on whether there was death due to the disease as the outcome.LASSO feature selection was employed to select relevant features.Subsequently,an EasyEnsemble imbalanced ensemble prediction model,as well as SMOTE+LightGBM,XGBoost,and Random Forest prediction models,were constructed using the selected features.Grid search was performed to optimize the parameters of each model.The classification performance of the models was evaluated using metrics such as AUC,precision,specificity,G-mean,and performance curves.Additionally,SHAP analysis was applied to interpret the models'results and provide insights into their interpretability.Results The EasyEnsemble model exhibited the highest overall performance,with an AUC of 0.80(95%CI:0.79~0.82),precision of 0.86(95%CI:0.78~0.93),specificity of 0.99(95%CI:0.98~0.99),and G-mean of 0.79(95%CI:0.76~0.83),as evidenced by the performance curves.Additionally,age,serum phosphorus,diabetes,and albumin were identified as important factors influencing patient prognosis.Conclusion The LASSO- EasyEnsemble imbalanced ensemble model enables accurate prognosis prediction for coronary heart disease patients,combining SHAP can help clinicians better assess disease severity and identify at-risk groups for personalized patient management.
5.Analysis of the Fulfillment Index for Malignant Neoplastic Deaths among Urban Residents in 2020
Liye ZHOU ; Sijing CHEN ; Mengjiao SUN
Chinese Journal of Health Statistics 2025;42(2):171-174
Objective Analyzing the impact of malignant tumors on the life expectancy of urban residents of different ages and genders.Methods Applying the Fulfillment Index proposed by Prithwis Das Gupta,analyze the 2020 national urban resident mortalitysurveillance data by age and gender using Excel software.Results In 2020,the lifeexpectancy at birth for the urban population was 81.21 years(78.80 years for males and 83.80 years for females).The loss of life expectancy due to malignant tumors was 2.90 years(3.33 years for males and 2.32 years for females).The majority of deaths from malignant tumors were concentrated in the 40~60 age group,and the impact on potential life expectancy increased with age.The 60- age group experienced the highest loss of potential life expectancy due to malignant tumors.There were differences in the Fulfillment Index between males and females.The highest Fulfillment Index for males was in the 60- age group(40.68),while for females,it was in the 40- age group(47.85).In age groups below 60,the Fulfillment Index for females was consistently higher than that for males,indicating a trend of malignant tumors occurring at a younger age in females.Conclusion The loss of life expectancy due to malignant tumors varies across different age groups and genders,highlighting the need for different prevention and treatment priorities.Health authorities should guide and enhance public awareness of malignant tumor prevention to help extend life expectancy.
6.Integrated Robust Clustering of Low-grade Gliomas Multi-omics Data based on Deep Learning
Gang DU ; Congcong JIA ; Xin ZHAO
Chinese Journal of Health Statistics 2025;42(2):185-190
Objective We proposed a new method that combines autoencoder in deep learning with optimally tuned robust improper maximum likelihood estimator(OTRIMLE)for multi-omics data robust clustering,and further applied it to lower-grade gliomas(LGG)patients clustering.Methods The dimension of LGG's miRNA,mRNA and methylation data was reduced nonlinearly by autoencoder,and then OTRIMLE method was used for robust clustering.Cox proportional hazard model was conducted to evaluate the prognostic risk of different clusters,and differentially expressed miRNAs(DEmiRNAs),differentially expressed mRNAs(DEmRNAs)and differentially methylated genes(DMGs)among different clusters were screened out by differential expression analysis.GO enrichment analysis was performed on the overlapping genes of target genes of DEmiRNAs,DEmRNAs,and DMGs.Finally,we compared the level of infiltrating immune cells and pathway activity in different clusters.Results LGG patients were classified into four clusters,in which the risk of death of patients in cluster 4 was 5.903 times higher than that in cluster 3.8 DEmiRNAs,2890 DEmRNAs and 46 DMGs were identified,and 658 overlapping genes obtained by joint analysis were enriched in 423 GO items.13 pathways with different activity and 4 immune cells with different level of immune infiltration were screened out.Conclusion The OTRIMLE method based on deep learning can effectively handle noise,sparsity and outliers in multi-omics data,achieving robust clustering for LGG patients.The identified immune cells and pathways provide theoretical bases for the subsequent targeted treatment of LGG.
7.Construction and Validation of Risk Prediction Model for Lung Adenocarcinoma based on TCGA Database
Mengyao GAO ; Huaxia MU ; Weixiao BU
Chinese Journal of Health Statistics 2025;42(2):191-196
Objective The purpose of this study is to screen the key genes and clinical characteristics related to the death or prognosis of lung adenocarcinoma(LUAD)patients based on the cancer genome atlas(TCGA)database,then construct and verify the effect of LUAD risk prediction model.Methods Clinical information and RNA sequencing data of lung adenocarcinoma patients were extracted from TCGA database.The deferentially expressed genes were screened,and hub genes were selected by protein interaction(PPI)network.70%of the data was used as the training set,and the entire data set was used as the validation set.In the training set,elastic net regression analysis was used to select prognostic genes and clinical characteristics,and Cox multivariate regression analysis was used to build a risk prediction model.The predictive performance of the model was evaluated by the area under the receiver's operating characteristic curve(AUC),C-index,and calibration curve.And the effect of the model was verified in the validation set.Results Elastic net regression analysis identified 23 factors associated with the survival status of LUAD patients.The variables that finally included in the predictive model include 6 genes(SEC61A1(P=0.004),MAP2K1(P=0.026),MMP1(P=0.001),SLC2A1(P=0.010),B4GALT1(P<0.001),ERO1A(P=0.024)),and M stage(P=0.003),N stage(P<0.001).In training set and test set,AUC was 0.764 and 0.710,C- index was 0.732 and 0.704,respectively.The tdROC curve and calibration curve showed that the predicted values of the model were highly consistent with the actual observed values.The Kaplan-Meier survival curve showed that the survival time of patients in low-risk group was statistically significantly longer than that of those in high-risk group(P<0.05).Conclusion Low expression of 2 genes(SEC61A1,MAP2K1),high expression of 4 genes(MMP1,SLC2A1,B4GALT1 and ERO1A),and distant metastasis of the primary tumors and the deepening of lymph node metastasis resulted in a significantly shorter survival time in LUAD patients.The prognosis analysis model based on elastic network has satisfactory predictive ability,which can provide scientific basis for prediction of the death risk of LUAD.
8.An Approach for Sample Size Determination in Clinical Trials of Rare Diseases based on Bayesian Decision Theory
Nana CHEN ; Zhiwei RONG ; Yan HOU
Chinese Journal of Health Statistics 2025;42(2):162-165
Objective Traditional methods for sample size estimation in clinical trial do not consider the patient size applicable to the results during the estimation process,and use point estimation for unknown true values of parameters,which has certain limitations in rare disease clinical trials.This article introduces a sample size estimation method based on Bayesian decision theory.Methods This article proposes a Tripartite Balanced Benefit Function(TBBF)and constructs a benefit function model based on the characteristics of acute and chronic diseases.The sample size in clinical trial is determined by maximizing expected benefits.Results The case analysis of hemophilia B demonstrated the application process of the model,and the sample size obtained by maximizing expected benefits is feasible in practical situations.This method has the advantage of being suitable for estimating sample sizes in small sample clinical trials.Conclusion TBBF fully utilizes prior information,incorporates patient size into the estimation process,and makes the quantitative form of different stakeholders'interests clearer,making the decision-making process more scientific and interpretable.
9.Application of Bayesian Poisson-logistic Joint Model in Assessing Underreporting Risk of Pulmonary Tuberculosis in Xinjiang
Zhichao LIANG ; Xinqi WANG ; Wanting XU
Chinese Journal of Health Statistics 2025;42(2):220-225
Objective A joint Poisson-logistic model in a Bayesian framework is proposed to constructed using tuberculosis(TB)reporting data from 14 prefectures in Xinjiang from 2014 to 2020 in combination with relevant social,economic,and environmental factors affecting the reported incidence rate of TB to explore potential underreporting areas of the TB reporting data,and to provide a strong evidence-based support for the subsequent decision-making on the precision prevention and control of TB.Methods Relevant factors affecting the reporting process and disease process of TB were collected,and important covariates were screened for inclusion in the model using the factor detector in the Geo-detector method,and the reported incidence model of TB and the expected incidence model of TB in Xinjiang were constructed separately,which together constituted a hybrid model of underreporting of TB(Poisson-logistic joint model).The mixed model was used to estimate the risk of TB underreporting in each prefecture of Xinjiang,and to explore the regional distribution of the potential risk of TB underreporting.Results Factor detector result pairs showed that GDP per capita was associated with the largest contribution to the risk of TB underreporting(0.5481);goodness-of-fit test showed that the data were well fitted(Bayesian P-value<0.001),and the Bayesian Poisson-logistic joint model could be applied to the study of the risk of underreporting of TB reporting data in Xinjiang from 2014 to 2020.The results showed that the risk of underreporting of TB The risk of underreporting of reported data was concentrated in the four southern Xinjiang prefectures,with the greatest risk of underreporting of TB reported data in Kashgar 0.1426(0.1403,0.1445).The lower risk of underreporting was concentrated in the eastern and central parts of Xinjiang,with the lowest risk of underreporting in the city of Karamay[0.1017(0.9983,0.1034)].In a joint Bayesian Poisson-logistic model,it was found that population density(IRR=1.0060,95%CI:1.0059~1.0061)and average annual temperature(IRR=1.0087,95%CI:1.0086~1.0088)were risk factors for underreporting of TB,and GDP per capita(IRR=0.9385,95%CI:0.9365~0.9394)and an increase in the number of registered nurses(IRR=0.9916,95%CI:0.9913 to 0.9920)reduced the risk of TB underreporting.Conclusion The Bayesian Poisson-logistic joint model estimated the potential incidence of TB in Xinjiang Uygur Autonomous Region and revealed significant discrepancies between reported and true TB incidence rates.It identified underreporting trends and localized potential underreporting risk areas,providing a theoretical basis for tailored and precise TB prevention and control strategies in Xinjiang.
10.Selection of Step -in Dosing Regimen based on Bayesian Model in Early Clinical Trials
Zihan ZHU ; Zihang ZHONG ; Senmiao NI
Chinese Journal of Health Statistics 2025;42(2):166-170,174
Objective To explore a Bayesian logistic regression model for step-in dosing regimens(eBLRM),which considers the cumulative toxicity probability across different dosing cycles to identify the maximum tolerated schedule(MTS).Methods The Bayesian logistic regression model(BLRM)was extended to obtain a posterior estimate for the cumulative toxicity probability of the last cycle based on accumulated patient data,enabling exploration of dose sequences.Results The performance of eBLRM was evaluated by comparison with the existing methods.Simulation results indicated that eBLRM performed better or equivalent in the proportion of the correct selection of MTS and patients assigned to real MTS under low-toxicity scenarios.In the case of high-toxicity scenarios,eBLRM had a higher proportion of early trial termination due to safety,resulting in slightly inferior performance compared to the existing method.Conclusion The eBLRM method demonstrates relatively good performance,providing a simple and comprehensible dose exploration approach for step-in dosing regimens.

Result Analysis
Print
Save
E-mail