1.Understanding Arteriosclerotic Heart Disease Patients Using Electronic Health Records: A Machine Learning and Shapley Additive exPlanations Approach
Eka MIRANDA ; Suko ADIARTO ; Faqir M. BHATTI ; Alfi Yusrotis ZAKIYYAH ; Mediana ARYUNI ; Charles BERNANDO
Healthcare Informatics Research 2023;29(3):228-238
Objectives:
The number of deaths from cardiovascular disease is projected to reach 23.3 million by 2030. As a contribution to preventing this phenomenon, this paper proposed a machine learning (ML) model to predict patients with arteriosclerotic heart disease (AHD). We also interpreted the prediction model results based on the ML approach and deployed modelagnostic ML methods to identify informative features and their interpretations.
Methods:
We used a hematology Electronic Health Record (EHR) with information on erythrocytes, hematocrit, hemoglobin, mean corpuscular hemoglobin, mean corpuscular hemoglobin concentration, leukocytes, thrombocytes, age, and sex. To detect and predict AHD, we explored random forest (RF), XGBoost, and AdaBoost models. We examined the prediction model results based on the confusion matrix and accuracy measures. We used the Shapley Additive exPlanations (SHAP) framework to interpret the ML model and quantify the contribution of features to predictions.
Results:
Our study included data from 6,837 patients, with 4,702 records from patients diagnosed with AHD and 2,135 records from patients without an AHD diagnosis. AdaBoost outperformed RF and XGBoost, achieving an accuracy of 0.78, precision of 0.82, F1-score of 0.85, and recall of 0.88. According to the SHAP summary bar plot method, hemoglobin was the most important attribute for detecting and predicting AHD patients. The SHAP local interpretability bar plot revealed that hemoglobin and mean corpuscular hemoglobin concentration had positive impacts on AHD prediction based on a single observation.
Conclusions
ML models based on real clinical data can be used to predict AHD.
2.Detection of Cardiovascular Disease Risk's Level for Adults Using Naive Bayes Classifier.
Eka MIRANDA ; Edy IRWANSYAH ; Alowisius Y AMELGA ; Marco M MARIBONDANG ; Mulyadi SALIM
Healthcare Informatics Research 2016;22(3):196-205
OBJECTIVES: The number of deaths caused by cardiovascular disease and stroke is predicted to reach 23.3 million in 2030. As a contribution to support prevention of this phenomenon, this paper proposes a mining model using a naïve Bayes classifier that could detect cardiovascular disease and identify its risk level for adults. METHODS: The process of designing the method began by identifying the knowledge related to the cardiovascular disease profile and the level of cardiovascular disease risk factors for adults based on the medical record, and designing a mining technique model using a naïve Bayes classifier. Evaluation of this research employed two methods: accuracy, sensitivity, and specificity calculation as well as an evaluation session with cardiologists and internists. The characteristics of cardiovascular disease are identified by its primary risk factors. Those factors are diabetes mellitus, the level of lipids in the blood, coronary artery function, and kidney function. Class labels were assigned according to the values of these factors: risk level 1, risk level 2 and risk level 3. RESULTS: The evaluation of the classifier performance (accuracy, sensitivity, and specificity) in this research showed that the proposed model predicted the class label of tuples correctly (above 80%). More than eighty percent of respondents (including cardiologists and internists) who participated in the evaluation session agree till strongly agreed that this research followed medical procedures and that the result can support medical analysis related to cardiovascular disease. CONCLUSIONS: The research showed that the proposed model achieves good performance for risk level detection of cardiovascular disease.
Adult*
;
Bayes Theorem
;
Bays*
;
Cardiovascular Diseases*
;
Classification
;
Coronary Vessels
;
Data Mining
;
Diabetes Mellitus
;
Diagnostic Techniques, Cardiovascular
;
Humans
;
Kidney
;
Medical Records
;
Methods
;
Mining
;
Risk Factors
;
Sensitivity and Specificity
;
Stroke
;
Surveys and Questionnaires

Result Analysis
Print
Save
E-mail