1.Real-Data Comparison of Data Mining Methods in Prediction of Diabetes in Iran.
Lily TAPAK ; Hossein MAHJUB ; Omid HAMIDI ; Jalal POOROLAJAL
Healthcare Informatics Research 2013;19(3):177-185
OBJECTIVES: Diabetes is one of the most common non-communicable diseases in developing countries. Early screening and diagnosis play an important role in effective prevention strategies. This study compared two traditional classification methods (logistic regression and Fisher linear discriminant analysis) and four machine-learning classifiers (neural networks, support vector machines, fuzzy c-mean, and random forests) to classify persons with and without diabetes. METHODS: The data set used in this study included 6,500 subjects from the Iranian national non-communicable diseases risk factors surveillance obtained through a cross-sectional survey. The obtained sample was based on cluster sampling of the Iran population which was conducted in 2005-2009 to assess the prevalence of major non-communicable disease risk factors. Ten risk factors that are commonly associated with diabetes were selected to compare the performance of six classifiers in terms of sensitivity, specificity, total accuracy, and area under the receiver operating characteristic (ROC) curve criteria. RESULTS: Support vector machines showed the highest total accuracy (0.986) as well as area under the ROC (0.979). Also, this method showed high specificity (1.000) and sensitivity (0.820). All other methods produced total accuracy of more than 85%, but for all methods, the sensitivity values were very low (less than 0.350). CONCLUSIONS: The results of this study indicate that, in terms of sensitivity, specificity, and overall classification accuracy, the support vector machine model ranks first among all the classifiers tested in the prediction of diabetes. Therefore, this approach is a promising classifier for predicting diabetes, and it should be further investigated for the prediction of other diseases.
Cross-Sectional Studies
;
Data Mining
;
Developing Countries
;
Humans
;
Iran
;
Logistic Models
;
Mass Screening
;
Prevalence
;
Risk Factors
;
ROC Curve
;
Sensitivity and Specificity
;
Support Vector Machine
2.Prediction of Kidney Graft Rejection Using Artificial Neural Network.
Leili TAPAK ; Omid HAMIDI ; Payam AMINI ; Jalal POOROLAJAL
Healthcare Informatics Research 2017;23(4):277-284
OBJECTIVES: Kidney transplantation is the best renal replacement therapy for patients with end-stage renal disease. Several studies have attempted to identify predisposing factors of graft rejection; however, the results have been inconsistent. We aimed to identify prognostic factors associated with kidney transplant rejection using the artificial neural network (ANN) approach and to compare the results with those obtained by logistic regression (LR). METHODS: The study used information regarding 378 patients who had undergone kidney transplantation from a retrospective study conducted in Hamadan, Western Iran, from 1994 to 2011. ANN was used to identify potential important risk factors for chronic nonreversible graft rejection. RESULTS: Recipients' age, creatinine level, cold ischemic time, and hemoglobin level at discharge were identified as the most important prognostic factors by ANN. The ANN model showed higher total accuracy (0.75 vs. 0.55 for LR), and the area under the ROC curve (0.88 vs. 0.75 for LR) was better than that obtained with LR. CONCLUSIONS: The results of this study indicate that the ANN model outperformed LR in the prediction of kidney transplantation failure. Therefore, this approach is a promising classifier for predicting graft failure to improve patients' survival and quality of life, and it should be further investigated for the prediction of other clinical outcomes.
Causality
;
Cold Ischemia
;
Creatinine
;
Data Mining
;
Graft Rejection*
;
Humans
;
Iran
;
Kidney Failure, Chronic
;
Kidney Transplantation
;
Kidney*
;
Logistic Models
;
Quality of Life
;
Renal Replacement Therapy
;
Retrospective Studies
;
Risk Factors
;
ROC Curve
;
Transplants*
3.Prevalence and Determinants of Preterm Birth in Tehran, Iran: A Comparison between Logistic Regression and Decision Tree Methods.
Payam AMINI ; Saman MAROUFIZADEH ; Reza Omani SAMANI ; Omid HAMIDI ; Mahdi SEPIDARKISH
Osong Public Health and Research Perspectives 2017;8(3):195-200
OBJECTIVES: Preterm birth (PTB) is a leading cause of neonatal death and the second biggest cause of death in children under five years of age. The objective of this study was to determine the prevalence of PTB and its associated factors using logistic regression and decision tree classification methods. METHODS: This cross-sectional study was conducted on 4,415 pregnant women in Tehran, Iran, from July 6–21, 2015. Data were collected by a researcher-developed questionnaire through interviews with mothers and review of their medical records. To evaluate the accuracy of the logistic regression and decision tree methods, several indices such as sensitivity, specificity, and the area under the curve were used. RESULTS: The PTB rate was 5.5% in this study. The logistic regression outperformed the decision tree for the classification of PTB based on risk factors. Logistic regression showed that multiple pregnancies, mothers with preeclampsia, and those who conceived with assisted reproductive technology had an increased risk for PTB (p < 0.05). CONCLUSION: Identifying and training mothers at risk as well as improving prenatal care may reduce the PTB rate. We also recommend that statisticians utilize the logistic regression model for the classification of risk groups for PTB.
Cause of Death
;
Child
;
Classification
;
Cross-Sectional Studies
;
Decision Trees*
;
Female
;
Humans
;
Infant
;
Iran*
;
Logistic Models*
;
Medical Records
;
Methods*
;
Mothers
;
Perinatal Death
;
Pre-Eclampsia
;
Pregnancy
;
Pregnancy, Multiple
;
Pregnant Women
;
Premature Birth*
;
Prenatal Care
;
Prevalence*
;
Reproductive Techniques, Assisted
;
Risk Factors
;
Sensitivity and Specificity