1.Real-Data Comparison of Data Mining Methods in Prediction of Diabetes in Iran.
Lily TAPAK ; Hossein MAHJUB ; Omid HAMIDI ; Jalal POOROLAJAL
Healthcare Informatics Research 2013;19(3):177-185
OBJECTIVES: Diabetes is one of the most common non-communicable diseases in developing countries. Early screening and diagnosis play an important role in effective prevention strategies. This study compared two traditional classification methods (logistic regression and Fisher linear discriminant analysis) and four machine-learning classifiers (neural networks, support vector machines, fuzzy c-mean, and random forests) to classify persons with and without diabetes. METHODS: The data set used in this study included 6,500 subjects from the Iranian national non-communicable diseases risk factors surveillance obtained through a cross-sectional survey. The obtained sample was based on cluster sampling of the Iran population which was conducted in 2005-2009 to assess the prevalence of major non-communicable disease risk factors. Ten risk factors that are commonly associated with diabetes were selected to compare the performance of six classifiers in terms of sensitivity, specificity, total accuracy, and area under the receiver operating characteristic (ROC) curve criteria. RESULTS: Support vector machines showed the highest total accuracy (0.986) as well as area under the ROC (0.979). Also, this method showed high specificity (1.000) and sensitivity (0.820). All other methods produced total accuracy of more than 85%, but for all methods, the sensitivity values were very low (less than 0.350). CONCLUSIONS: The results of this study indicate that, in terms of sensitivity, specificity, and overall classification accuracy, the support vector machine model ranks first among all the classifiers tested in the prediction of diabetes. Therefore, this approach is a promising classifier for predicting diabetes, and it should be further investigated for the prediction of other diseases.
Cross-Sectional Studies
;
Data Mining
;
Developing Countries
;
Humans
;
Iran
;
Logistic Models
;
Mass Screening
;
Prevalence
;
Risk Factors
;
ROC Curve
;
Sensitivity and Specificity
;
Support Vector Machine
2.Survival Analysis of Gastric Cancer Patients with Incomplete Data.
Abbas MOGHIMBEIGI ; Lily TAPAK ; Ghodaratolla ROSHANAEI ; Hossein MAHJUB
Journal of Gastric Cancer 2014;14(4):259-265
PURPOSE: Survival analysis of gastric cancer patients requires knowledge about factors that affect survival time. This paper attempted to analyze the survival of patients with incomplete registered data by using imputation methods. MATERIALS AND METHODS: Three missing data imputation methods, including regression, expectation maximization algorithm, and multiple imputation (MI) using Monte Carlo Markov Chain methods, were applied to the data of cancer patients referred to the cancer institute at Imam Khomeini Hospital in Tehran in 2003 to 2008. The data included demographic variables, survival times, and censored variable of 471 patients with gastric cancer. After using imputation methods to account for missing covariate data, the data were analyzed using a Cox regression model and the results were compared. RESULTS: The mean patient survival time after diagnosis was 49.1+/-4.4 months. In the complete case analysis, which used information from 100 of the 471 patients, very wide and uninformative confidence intervals were obtained for the chemotherapy and surgery hazard ratios (HRs). However, after imputation, the maximum confidence interval widths for the chemotherapy and surgery HRs were 8.470 and 0.806, respectively. The minimum width corresponded with MI. Furthermore, the minimum Bayesian and Akaike information criteria values correlated with MI (-821.236 and -827.866, respectively). CONCLUSIONS: Missing value imputation increased the estimate precision and accuracy. In addition, MI yielded better results when compared with the expectation maximization algorithm and regression simple imputation methods.
Diagnosis
;
Drug Therapy
;
Humans
;
Markov Chains
;
Proportional Hazards Models
;
Stomach Neoplasms*
;
Survival Analysis*
3.Factors associated with mortality from tuberculosis in Iran: an application of a generalized estimating equation-based zero-inflated negative binomial model to national registry data
Fatemeh SARVI ; Abbas MOGHIMBEIGI ; Hossein MAHJUB ; Mahshid NASEHI ; Mahmoud KHODADOST
Epidemiology and Health 2019;41(1):2019032-
OBJECTIVES: Tuberculosis (TB) is a global public health problem that causes morbidity and mortality in millions of people per year. The purpose of this study was to examine the relationship of potential risk factors with TB mortality in Iran.METHODS: This cross-sectional study was performed on 9,151 patients with TB from March 2017 to March 2018 in Iran. Data were gathered from all 429 counties of Iran by the Ministry of Health and Medical Education and Statistical Center of Iran. In this study, a generalized estimating equation-based zero-inflated negative binomial model was used to determine the effect of related factors on TB mortality at the community level. For data analysis, R version 3.4.2 was used with the relevant packages.RESULTS: The risk of mortality from TB was found to increase with the unemployment rate (β^=0.02), illiteracy (β^=0.04), household density per residential unit (β^=1.29), distance between the center of the county and the provincial capital (β^=0.03), and urbanization (β^=0.81). The following other risk factors for TB mortality were identified: diabetes (β^=0.02), human immunodeficiency virus infection (β^=0.04), infection with TB in the most recent 2 years (β^=0.07), injection drug use (β^=0.07), long-term corticosteroid use (β^=0.09), malignant diseases (β^=0.09), chronic kidney disease (β^=0.32), gastrectomy (β^=0.50), chronic malnutrition (β^=0.38), and a body mass index more than 10% under the ideal weight (β^=0.01). However, silicosis had no effect.CONCLUSIONS: The results of this study provide useful information on risk factors for mortality from TB.
Body Mass Index
;
Cross-Sectional Studies
;
Education, Medical
;
Family Characteristics
;
Gastrectomy
;
HIV
;
Humans
;
Iran
;
Literacy
;
Malnutrition
;
Models, Statistical
;
Mortality
;
Public Health
;
Renal Insufficiency, Chronic
;
Risk Factors
;
Silicosis
;
Statistics as Topic
;
Tuberculosis
;
Unemployment
;
Urbanization
4.Factors associated with mortality from tuberculosis in Iran: an application of a generalized estimating equation-based zero-inflated negative binomial model to national registry data
Fatemeh SARVI ; Abbas MOGHIMBEIGI ; Hossein MAHJUB ; Mahshid NASEHI ; Mahmoud KHODADOST
Epidemiology and Health 2019;41(1):e2019032-
OBJECTIVES: Tuberculosis (TB) is a global public health problem that causes morbidity and mortality in millions of people per year. The purpose of this study was to examine the relationship of potential risk factors with TB mortality in Iran. METHODS: This cross-sectional study was performed on 9,151 patients with TB from March 2017 to March 2018 in Iran. Data were gathered from all 429 counties of Iran by the Ministry of Health and Medical Education and Statistical Center of Iran. In this study, a generalized estimating equation-based zero-inflated negative binomial model was used to determine the effect of related factors on TB mortality at the community level. For data analysis, R version 3.4.2 was used with the relevant packages. RESULTS: The risk of mortality from TB was found to increase with the unemployment rate (β^=0.02), illiteracy (β^=0.04), household density per residential unit (β^=1.29), distance between the center of the county and the provincial capital (β^=0.03), and urbanization (β^=0.81). The following other risk factors for TB mortality were identified: diabetes (β^=0.02), human immunodeficiency virus infection (β^=0.04), infection with TB in the most recent 2 years (β^=0.07), injection drug use (β^=0.07), long-term corticosteroid use (β^=0.09), malignant diseases (β^=0.09), chronic kidney disease (β^=0.32), gastrectomy (β^=0.50), chronic malnutrition (β^=0.38), and a body mass index more than 10% under the ideal weight (β^=0.01). However, silicosis had no effect. CONCLUSIONS: The results of this study provide useful information on risk factors for mortality from TB.
Body Mass Index
;
Cross-Sectional Studies
;
Education, Medical
;
Family Characteristics
;
Gastrectomy
;
HIV
;
Humans
;
Iran
;
Literacy
;
Malnutrition
;
Models, Statistical
;
Mortality
;
Public Health
;
Renal Insufficiency, Chronic
;
Risk Factors
;
Silicosis
;
Statistics as Topic
;
Tuberculosis
;
Unemployment
;
Urbanization
5.Estimation of the Frequency of Intravenous Drug Users in Hamadan City, Iran, Using the Capture-recapture Method.
Salman KHAZAEI ; Jalal POOROLAJAL ; Hossein MAHJUB ; Nader ESMAILNASAB ; Mohammad MIRZAEI
Epidemiology and Health 2012;34(1):e2012006-
OBJECTIVES: The number of illicit drug users is prone to underestimation. This study aimed to use the capture-recapture method as a statistical procedure for measuring the prevalence of intravenous drug users (IDUs) by estimating the number of unknown IDUs not registered by any of the registry centers. METHODS: This study was conducted in Hamadan City, the west of Iran, in 2012. Three incomplete data sources of IDUs, with partial overlapping data, were assessed including: (a) Volunteer Counseling and Testing Centers (VCTCs); (b) Drop in Centers (DICs); and (c) Outreach Teams (ORTs). A log-linear model was applied for the analysis of three-sample capture-recapture results. Two information criteria were used for model selection including Akaike's Information Criterion and the Bayesian Information Criterion. RESULTS: Out of 1,478 IDUs registered by three centers, 48% were identified by VCTCs, 32% by DICs, and 20% by ORTs. After exclusion of duplicates, 1,369 IDUs remained. According to our findings, there were 9,964 (95% CI, 6,088 to 17,636) IDUs not identified by any of the centers. Hence, the real number of IDUs is expected to be 11,333. Based on these findings, the overall completeness of the three data sources was around 12% (95% CI, 7% to 18%). CONCLUSION: There was a considerable number of IDUs not identified by any of the centers. Although the capture-recapture method is a useful and practical approach for estimating unknown populations, due to the assumptions and limitations of the method, the results must be interpreted with caution.
Counseling
;
Dacarbazine
;
Drug Users
;
Humans
;
Iran
;
Linear Models
;
Prevalence
;
Information Storage and Retrieval
6.Diabetic peripheral neuropathy class prediction by multicategory support vector machine model: a cross-sectional study.
Maryam KAZEMI ; Abbas MOGHIMBEIGI ; Javad KIANI ; Hossein MAHJUB ; Javad FARADMAL
Epidemiology and Health 2016;38(1):e2016011-
OBJECTIVES: Diabetes is increasing in worldwide prevalence, toward epidemic levels. Diabetic neuropathy, one of the most common complications of diabetes mellitus, is a serious condition that can lead to amputation. This study used a multicategory support vector machine (MSVM) to predict diabetic peripheral neuropathy severity classified into four categories using patients' demographic characteristics and clinical features. METHODS: In this study, the data were collected at the Diabetes Center of Hamadan in Iran. Patients were enrolled by the convenience sampling method. Six hundred patients were recruited. After obtaining informed consent, a questionnaire collecting general information and a neuropathy disability score (NDS) questionnaire were administered. The NDS was used to classify the severity of the disease. We used MSVM with both one-against-all and one-against-one methods and three kernel functions, radial basis function (RBF), linear, and polynomial, to predict the class of disease with an unbalanced dataset. The synthetic minority class oversampling technique algorithm was used to improve model performance. To compare the performance of the models, the mean of accuracy was used. RESULTS: For predicting diabetic neuropathy, a classifier built from a balanced dataset and the RBF kernel function with a one-against-one strategy predicted the class to which a patient belonged with about 76% accuracy. CONCLUSIONS: The results of this study indicate that, in terms of overall classification accuracy, the MSVM model based on a balanced dataset can be useful for predicting the severity of diabetic neuropathy, and it should be further investigated for the prediction of other diseases.
Amputation
;
Classification
;
Cross-Sectional Studies*
;
Dataset
;
Diabetes Complications
;
Diabetic Neuropathies
;
Humans
;
Informed Consent
;
Iran
;
Logistic Models
;
Methods
;
Peripheral Nervous System Diseases*
;
Prevalence
;
Support Vector Machine*
7.Predicting Hospital Readmission in Heart Failure Patients in Iran: A Comparison of Various Machine Learning Methods
Roya NAJAFI-VOSOUGH ; Javad FARADMAL ; Seyed Kianoosh HOSSEINI ; Abbas MOGHIMBEIGI ; Hossein MAHJUB
Healthcare Informatics Research 2021;27(4):307-314
Objectives:
Heart failure (HF) is a common disease with a high hospital readmission rate. This study considered class imbalance and missing data, which are two common issues in medical data. The current study’s main goal was to compare the performance of six machine learning (ML) methods for predicting hospital readmission in HF patients.
Methods:
In this retrospective cohort study, information of 1,856 HF patients was analyzed. These patients were hospitalized in Farshchian Heart Center in Hamadan Province in Western Iran, from October 2015 to July 2019. The support vector machine (SVM), least-square SVM (LS-SVM), bagging, random forest (RF), AdaBoost, and naïve Bayes (NB) methods were used to predict hospital readmission. These methods’ performance was evaluated using sensitivity, specificity, positive predictive value, negative predictive value, and accuracy. Two imputation methods were also used to deal with missing data.
Results:
Of the 1,856 HF patients, 29.9% had at least one hospital readmission. Among the ML methods, LS-SVM performed the worst, with accuracy in the range of 0.57–0.60, while RF performed the best, with the highest accuracy (range, 0.90–0.91). Other ML methods showed relatively good performance, with accuracy exceeding 0.84 in the test datasets. Furthermore, the performance of the SVM and LS-SVM methods in terms of accuracy was higher with the multiple imputation method than with the median imputation method.
Conclusions
This study showed that RF performed better, in terms of accuracy, than other methods for predicting hospital readmission in HF patients.
8.Factors associated with mortality from tuberculosis in Iran: an application of a generalized estimating equation-based zero-inflated negative binomial model to national registry data
Fatemeh SARVI ; Abbas MOGHIMBEIGI ; Hossein MAHJUB ; Mahshid NASEHI ; Mahmoud KHODADOST
Epidemiology and Health 2019;41():e2019032-
OBJECTIVES:
Tuberculosis (TB) is a global public health problem that causes morbidity and mortality in millions of people per year. The purpose of this study was to examine the relationship of potential risk factors with TB mortality in Iran.
METHODS:
This cross-sectional study was performed on 9,151 patients with TB from March 2017 to March 2018 in Iran. Data were gathered from all 429 counties of Iran by the Ministry of Health and Medical Education and Statistical Center of Iran. In this study, a generalized estimating equation-based zero-inflated negative binomial model was used to determine the effect of related factors on TB mortality at the community level. For data analysis, R version 3.4.2 was used with the relevant packages.
RESULTS:
The risk of mortality from TB was found to increase with the unemployment rate (β^=0.02), illiteracy (β^=0.04), household density per residential unit (β^=1.29), distance between the center of the county and the provincial capital (β^=0.03), and urbanization (β^=0.81). The following other risk factors for TB mortality were identified: diabetes (β^=0.02), human immunodeficiency virus infection (β^=0.04), infection with TB in the most recent 2 years (β^=0.07), injection drug use (β^=0.07), long-term corticosteroid use (β^=0.09), malignant diseases (β^=0.09), chronic kidney disease (β^=0.32), gastrectomy (β^=0.50), chronic malnutrition (β^=0.38), and a body mass index more than 10% under the ideal weight (β^=0.01). However, silicosis had no effect.
CONCLUSIONS
The results of this study provide useful information on risk factors for mortality from TB.