1.Semi-supervised Long-tail Endoscopic Image Classification.
Run-Nan CAO ; Meng-Jie FANG ; Hai-Ling LI ; Jie TIAN ; Di DONG
Chinese Medical Sciences Journal 2022;37(3):171-180
Objective To explore the semi-supervised learning (SSL) algorithm for long-tail endoscopic image classification with limited annotations. Method We explored semi-supervised long-tail endoscopic image classification in HyperKvasir, the largest gastrointestinal public dataset with 23 diverse classes. Semi-supervised learning algorithm FixMatch was applied based on consistency regularization and pseudo-labeling. After splitting the training dataset and the test dataset at a ratio of 4:1, we sampled 20%, 50%, and 100% labeled training data to test the classification with limited annotations. Results The classification performance was evaluated by micro-average and macro-average evaluation metrics, with the Mathews correlation coefficient (MCC) as the overall evaluation. SSL algorithm improved the classification performance, with MCC increasing from 0.8761 to 0.8850, from 0.8983 to 0.8994, and from 0.9075 to 0.9095 with 20%, 50%, and 100% ratio of labeled training data, respectively. With a 20% ratio of labeled training data, SSL improved both the micro-average and macro-average classification performance; while for the ratio of 50% and 100%, SSL improved the micro-average performance but hurt macro-average performance. Through analyzing the confusion matrix and labeling bias in each class, we found that the pseudo-based SSL algorithm exacerbated the classifier's preference for the head class, resulting in improved performance in the head class and degenerated performance in the tail class. Conclusion SSL can improve the classification performance for semi-supervised long-tail endoscopic image classification, especially when the labeled data is extremely limited, which may benefit the building of assisted diagnosis systems for low-volume hospitals. However, the pseudo-labeling strategy may amplify the effect of class imbalance, which hurts the classification performance for the tail class.
Supervised Machine Learning
;
Algorithms
2.Heart Alert: A heart disease prediction system using machine learning approach and optimization techniques
Justin Allen P. Denopol ; Ma. Sheila A. Magboo ; Vincent Peter C. Magboo
Philippine Journal of Health Research and Development 2022;26(3):83-92
Background:
Cardiovascular diseases belong to the top three leading causes of mortality in the Philippines with 17.8 % of the total deaths. Lifestyle-related habits such as alcohol consumption, smoking, poor diet and nutrition, high sedentary behavior, overweight, and obesity have been increasingly implicated in the high rates of heart disease among Filipinos leading to a significant burden to the country's healthcare system. The objective of this study was to predict the presence of heart disease using various machine learning algorithms (support vector machine, naïve Bayes, random forest, logistic regression, decision tree, and adaptive boosting) evaluated on an anonymized publicly available cardiovascular disease dataset.
Methodology:
Various machine learning algorithms were applied on an anonymized publicly available
cardiovascular dataset from a machine learning data repository (IEEE Dataport). A web-based application
system named Heart Alert was developed based on the best machine learning model that would predict the risk of developing heart disease. An assessment of the effects of different optimization techniques as to the imputation methods (mean, median, mode, and multiple imputation by chained equations) and as to the feature selection method (recursive feature elimination) on the classification performance of the machine learning algorithms was made. All simulation experiments were implemented via Python 3.8 and its machine learning libraries (Scikit-learn, Keras, Tensorflow, Pandas, Matplotlib, Seaborn, NumPy).
Results:
The support vector machine without imputation and feature selection obtained the highest
performance metrics (90.2% accuracy, 87.7% sensitivity, 93.6% specificity, 94.9% precision, 91.2% F1-score and an area under the receiver operating characteristic curve of 0.902 ) and was used to implement the heart disease prediction system (Heart Alert). Following very closely were random forest with mean or median imputation and logistic regression with mode imputation, all having no feature selection which also performed well.
Conclusion
The performance of the best four machine learning models suggests that for this dataset,
imputation technique for missing values may or may not be done. Likewise, recursive feature elimination for feature selection may not apply as all variables seem to be important in heart disease prediction. An early accurate diagnosis leading to prompt intervention efforts is very crucial as it improves the patient's quality of life and diminishes the risk of developing cardiac events.
Machine Learning
;
Support Vector Machine
3.Population Pharmacokinetic and Pharmacodynamic Models of Propofol in Healthy Volunteers using NONMEM and Machine Learning Methods.
Yoo Mi KIM ; Sung Hong KANG ; Il Su PARK ; Gyu Jeong NOH
Journal of Korean Society of Medical Informatics 2008;14(2):147-159
OBJECTIVES: The primary objective of this study is to compare model performance of machine learning methods with that of a previous study in which a nonlinear mixed effects model was created using NONMEM(R) for the pharmacokinetic and pharmacodynamic data for propofol. The secondary objective was to evaluate if a pharmacodynamic model describing the relationship between the dose of propofol and bispectral index (BIS) outperform that describing the relationship between a pharmacokinetic model derived-predicted concentrations of propofol and BIS. METHODS: Data were collected during a study involving the infusion of propofol into healthy volunteers. Pharmacokinetic and pharmacodynamic models were constructed using artificial neural networks (ANNs), support vector machines (SVMs), and multi-method ensembles and were compared with the nonlinear mixed effects method as implemented by NONMEM(R). Model performance was assessed by goodness-of-fit statistics, paired t-tests between predicted and observed values for each model and scatterplots. RESULTS: In pharmacokinetic analysis, ensemble I, the mean of ANN and NONMEM(R) predictions, achieved minimal error and the highest correlation coefficient. SVM produced the highest error and the lowest correlation coefficient. In pharmacodynamic analysis, ANN exhibited the best performance. An ANNModel describing the relationship between the dose of propofol and BIS was not inferior to an ANN model describing the relationship between predicted concentrations of propofol derived from an ANN pharmacokinetic model and BIS. CONCLUSIONS: In pharmacokinetic analysis, ensemble combined with ANN achieved slightly better performance than NONMEM(R). The relationship between the dose of propofol and BIS can be predicted without considering pharmacokinetics of propofol.
Machine Learning
;
Propofol
;
Support Vector Machine
4.Prediction of trends for fine-scale spread of Oncomelania hupensis in Shanghai Municipality based on supervised machine learning models.
Yan Feng GONG ; Zhuo Wei LUO ; Jia Xin FENG ; Jing Bo XUE ; Zhao Yu GUO ; Yan Jun JIN ; Qing YU ; Shang XIA ; Shan LÜ ; Jing XU ; Shi Zhu LI
Chinese Journal of Schistosomiasis Control 2022;34(3):241-251
OBJECTIVE:
To predict the trends for fine-scale spread of Oncomelania hupensis based on supervised machine learning models in Shanghai Municipality, so as to provide insights into precision O. hupensis snail control.
METHODS:
Based on 2016 O. hupensis snail survey data in Shanghai Municipality and climatic, geographical, vegetation and socioeconomic data relating to O. hupensis snail distribution, seven supervised machine learning models were created to predict the risk of snail spread in Shanghai, including decision tree, random forest, generalized boosted model, support vector machine, naive Bayes, k-nearest neighbor and C5.0. The performance of seven models for predicting snail spread was evaluated with the area under the receiver operating characteristic curve (AUC), F1-score and accuracy, and optimal models were selected to identify the environmental variables affecting snail spread and predict the areas at risk of snail spread in Shanghai Municipality.
RESULTS:
Seven supervised machine learning models were successfully created to predict the risk of snail spread in Shanghai Municipality, and random forest (AUC = 0.901, F1-score = 0.840, ACC = 0.797) and generalized boosted model (AUC= 0.889, F1-score = 0.869, ACC = 0.835) showed higher predictive performance than other models. Random forest analysis showed that the three most important climatic variables contributing to snail spread in Shanghai included aridity (11.87%), ≥ 0 °C annual accumulated temperature (10.19%), moisture index (10.18%) and average annual precipitation (9.86%), the two most important vegetation variables included the vegetation index of the first quarter (8.30%) and vegetation index of the second quarter (7.69%). Snails were more likely to spread at aridity of < 0.87, ≥ 0 °C annual accumulated temperature of 5 550 to 5 675 °C, moisture index of > 39% and average annual precipitation of > 1 180 mm, and with the vegetation index of the first quarter of > 0.4 and the vegetation index of the first quarter of > 0.6. According to the water resource developments and township administrative maps, the areas at risk of snail spread were mainly predicted in 10 townships/subdistricts, covering the Xipian, Dongpian and Tainan sections of southern Shanghai.
CONCLUSIONS
Supervised machine learning models are effective to predict the risk of fine-scale O. hupensis snail spread and identify the environmental determinants relating to snail spread. The areas at risk of O. hupensis snail spread are mainly located in southwestern Songjiang District, northwestern Jinshan District and southeastern Qingpu District of Shanghai Municipality.
Animals
;
Bayes Theorem
;
China/epidemiology*
;
Ecosystem
;
Gastropoda
;
Supervised Machine Learning
5.Arousal and Valence Classification Model Based on Long Short-Term Memory and DEAP Data for Mental Healthcare Management.
Eun Jeong CHOI ; Dong Keun KIM
Healthcare Informatics Research 2018;24(4):309-316
OBJECTIVES: Both the valence and arousal components of affect are important considerations when managing mental healthcare because they are associated with affective and physiological responses. Research on arousal and valence analysis, which uses images, texts, and physiological signals that employ deep learning, is actively underway; research investigating how to improve the recognition rate is needed. The goal of this research was to design a deep learning framework and model to classify arousal and valence, indicating positive and negative degrees of emotion as high or low. METHODS: The proposed arousal and valence classification model to analyze the affective state was tested using data from 40 channels provided by a dataset for emotion analysis using electrocardiography (EEG), physiological, and video signals (the DEAP dataset). Experiments were based on 10 selected featured central and peripheral nervous system data points, using long short-term memory (LSTM) as a deep learning method. RESULTS: The arousal and valence were classified and visualized on a two-dimensional coordinate plane. Profiles were designed depending on the number of hidden layers, nodes, and hyperparameters according to the error rate. The experimental results show an arousal and valence classification model accuracy of 74.65 and 78%, respectively. The proposed model performed better than previous other models. CONCLUSIONS: The proposed model appears to be effective in analyzing arousal and valence; specifically, it is expected that affective analysis using physiological signals based on LSTM will be possible without manual feature extraction. In a future study, the classification model will be adopted in mental healthcare management systems.
Arousal*
;
Classification*
;
Dataset
;
Delivery of Health Care*
;
Electrocardiography
;
Learning
;
Machine Learning
;
Memory, Short-Term*
;
Methods
;
Peripheral Nervous System
;
Supervised Machine Learning
6.Application of Deep Learning System into the Development of Communication Device for Quadriplegic Patient
Jung Hwan LEE ; Taewoo KANG ; Byung Kwan CHOI ; In Ho HAN ; Byung Chul KIM ; Jung Hoon RO
Korean Journal of Neurotrauma 2019;15(2):88-94
OBJECTIVE: In general, quadriplegic patients use their voices to call the caregiver. However, severe quadriplegic patients are in a state of tracheostomy, and cannot generate a voice. These patients require other communication tools to call caregivers. Recently, monitoring of eye status using artificial intelligence (AI) has been widely used in various fields. We made eye status monitoring system using deep learning, and developed a communication system for quadriplegic patients can call the caregiver. METHODS: The communication system consists of 3 programs. The first program was developed for automatic capturing of eye images from the face using a webcam. It continuously captured and stored 15 eye images per second. Secondly, the captured eye images were evaluated for open or closed status by deep learning, which is a type of AI. Google TensorFlow was used as a machine learning tool or library for convolutional neural network. A total of 18,000 images were used to train deep learning system. Finally, the program was developed to utter a sound when the left eye was closed for 3 seconds. RESULTS: The test accuracy of eye status was 98.7%. In practice, when the quadriplegic patient looked at the webcam and closed his left eye for 3 seconds, the sound for calling a caregiver was generated. CONCLUSION: Our eye status detection software using AI is very accurate, and the calling system for the quadriplegic patient was satisfactory.
Artificial Intelligence
;
Caregivers
;
Humans
;
Learning
;
Machine Learning
;
Quadriplegia
;
Tracheostomy
;
Unsupervised Machine Learning
;
Voice
7.Diagnosis and prediction of periodontally compromised teeth using a deep learning-based convolutional neural network algorithm
Jae Hong LEE ; Do hyung KIM ; Seong Nyum JEONG ; Seong Ho CHOI
Journal of Periodontal & Implant Science 2018;48(2):114-123
PURPOSE: The aim of the current study was to develop a computer-assisted detection system based on a deep convolutional neural network (CNN) algorithm and to evaluate the potential usefulness and accuracy of this system for the diagnosis and prediction of periodontally compromised teeth (PCT). METHODS: Combining pretrained deep CNN architecture and a self-trained network, periapical radiographic images were used to determine the optimal CNN algorithm and weights. The diagnostic and predictive accuracy, sensitivity, specificity, positive predictive value, negative predictive value, receiver operating characteristic (ROC) curve, area under the ROC curve, confusion matrix, and 95% confidence intervals (CIs) were calculated using our deep CNN algorithm, based on a Keras framework in Python. RESULTS: The periapical radiographic dataset was split into training (n=1,044), validation (n=348), and test (n=348) datasets. With the deep learning algorithm, the diagnostic accuracy for PCT was 81.0% for premolars and 76.7% for molars. Using 64 premolars and 64 molars that were clinically diagnosed as severe PCT, the accuracy of predicting extraction was 82.8% (95% CI, 70.1%–91.2%) for premolars and 73.4% (95% CI, 59.9%–84.0%) for molars. CONCLUSIONS: We demonstrated that the deep CNN algorithm was useful for assessing the diagnosis and predictability of PCT. Therefore, with further optimization of the PCT dataset and improvements in the algorithm, a computer-aided detection system can be expected to become an effective and efficient method of diagnosing and predicting PCT.
Area Under Curve
;
Artificial Intelligence
;
Bicuspid
;
Boidae
;
Dataset
;
Diagnosis
;
Learning
;
Machine Learning
;
Methods
;
Molar
;
Periodontal Diseases
;
ROC Curve
;
Sensitivity and Specificity
;
Supervised Machine Learning
;
Tooth
;
Weights and Measures
8.Augmentation of Doppler Radar Data Using Generative Adversarial Network for Human Motion Analysis
Ibrahim ALNUJAIM ; Youngwook KIM
Healthcare Informatics Research 2019;25(4):344-349
OBJECTIVES: Human motion analysis can be applied to the diagnosis of musculoskeletal diseases, rehabilitation therapies, fall detection, and estimation of energy expenditure. To analyze human motion with micro-Doppler signatures measured by radar, a deep learning algorithm is one of the most effective approaches. Because deep learning requires a large data set, the high cost involved in measuring large amounts of human data is an intrinsic problem. The objective of this study is to augment human motion micro-Doppler data employing generative adversarial networks (GANs) to improve the accuracy of human motion classification. METHODS: To test data augmentation provided by GANs, authentic data for 7 human activities were collected using micro-Doppler radar. Each motion yielded 144 data samples. Software including GPU driver, CUDA library, cuDNN library, and Anaconda were installed to train the GANs. Keras-GPU, SciPy, Pillow, OpenCV, Matplotlib, and Git were used to create an Anaconda environment. The data produced by GANs were saved every 300 epochs, and the training was stopped at 3,000 epochs. The images generated from each epoch were evaluated, and the best images were selected. RESULTS: Each data set of the micro-Doppler signatures, consisting of 144 data samples, was augmented to produce 1,472 synthesized spectrograms of 64 × 64. Using the augmented spectrograms, the deep neural network was trained, increasing the accuracy of human motion classification. CONCLUSIONS: Data augmentation to increase the amount of training data was successfully conducted through the use of GANs. Thus, augmented micro-Doppler data can contribute to improving the accuracy of human motion recognition.
Boidae
;
Classification
;
Dataset
;
Diagnosis
;
Energy Metabolism
;
Human Activities
;
Humans
;
Learning
;
Motion Perception
;
Musculoskeletal Diseases
;
Rehabilitation
;
Supervised Machine Learning
9.Improving the Performance of Text Categorization Models used for the Selection of High Quality Articles.
Healthcare Informatics Research 2012;18(1):18-28
OBJECTIVES: Machine learning systems can considerably reduce the time and effort needed by experts to perform new systematic reviews (SRs). This study investigates categorization models, which are trained on a combination of included and commonly excluded articles, which can improve performance by identifying high quality articles for new procedures or drug SRs. METHODS: Test collections were built using the annotated reference files from 19 procedure and 15 drug systematic reviews. The classification models, using a support vector machine, were trained by the combined even data of other topics, excepting the desired topic. This approach was compared to the combination of included and commonly excluded articles with the combination of included and excluded articles. Accuracy was used for the measure of comparison. RESULTS: On average, the performance was improved by about 15% in the procedure topics and 11% in the drug topics when the classification models trained on the combination of articles included and commonly excluded, were used. The system using the combination of included and commonly excluded articles performed better than the combination of included and excluded articles in all of the procedure topics. CONCLUSIONS: Automatically rigorous article classification using machine learning can reduce the workload of experts when they perform systematic reviews when the topic-specific data are scarce. In particular, when the combination of included and commonly excluded articles is used, this system will be more effective.
Evidence-Based Medicine
;
Machine Learning
;
Review Literature as Topic
;
Support Vector Machine
10.Sleep stage estimation method using a camera for home use
Teruaki NOCHINO ; Yuko OHNO ; Takafumi KATO ; Masako TANIIKE ; Shima OKADA
Biomedical Engineering Letters 2019;9(2):257-265
Recent studies have developed simple techniques for monitoring and assessing sleep. However, several issues remain to be solved for example high-cost sensor and algorithm as a home-use device. In this study, we aimed to develop an inexpensive and simple sleep monitoring system using a camera and video processing. Polysomnography (PSG) recordings were performed in six subjects for four consecutive nights. Subjects' body movements were simultaneously recorded by the web camera. Body movement was extracted by video processing from the video data and fi ve parameters were calculated for machine learning. Four sleep stages (WAKE, LIGHT, DEEP and REM) were estimated by applying these fi ve parameters to a support vector machine. The overall estimation accuracy was 70.3 ± 11.3% with the highest accuracy for DEEP (82.8 ± 4.7%) and the lowest for LIGHT (53.0 ± 4.0%) compared with correct sleep stages manually scored on PSG data by a sleep technician. Estimation accuracy for REM sleep was 68.0 ± 6.8%. The kappa was 0.19 ± 0.04 for all subjects. The present non-contact sleep monitoring system showed suffi cient accuracy in sleep stage estimation with REM sleep detection being accomplished. Low-cost computing power of this system can be advantageous for mobile application and modularization into home-device.
Machine Learning
;
Methods
;
Mobile Applications
;
Polysomnography
;
Sleep Stages
;
Sleep, REM
;
Support Vector Machine