1.Semi-supervised Long-tail Endoscopic Image Classification.
Run-Nan CAO ; Meng-Jie FANG ; Hai-Ling LI ; Jie TIAN ; Di DONG
Chinese Medical Sciences Journal 2022;37(3):171-180
Objective To explore the semi-supervised learning (SSL) algorithm for long-tail endoscopic image classification with limited annotations. Method We explored semi-supervised long-tail endoscopic image classification in HyperKvasir, the largest gastrointestinal public dataset with 23 diverse classes. Semi-supervised learning algorithm FixMatch was applied based on consistency regularization and pseudo-labeling. After splitting the training dataset and the test dataset at a ratio of 4:1, we sampled 20%, 50%, and 100% labeled training data to test the classification with limited annotations. Results The classification performance was evaluated by micro-average and macro-average evaluation metrics, with the Mathews correlation coefficient (MCC) as the overall evaluation. SSL algorithm improved the classification performance, with MCC increasing from 0.8761 to 0.8850, from 0.8983 to 0.8994, and from 0.9075 to 0.9095 with 20%, 50%, and 100% ratio of labeled training data, respectively. With a 20% ratio of labeled training data, SSL improved both the micro-average and macro-average classification performance; while for the ratio of 50% and 100%, SSL improved the micro-average performance but hurt macro-average performance. Through analyzing the confusion matrix and labeling bias in each class, we found that the pseudo-based SSL algorithm exacerbated the classifier's preference for the head class, resulting in improved performance in the head class and degenerated performance in the tail class. Conclusion SSL can improve the classification performance for semi-supervised long-tail endoscopic image classification, especially when the labeled data is extremely limited, which may benefit the building of assisted diagnosis systems for low-volume hospitals. However, the pseudo-labeling strategy may amplify the effect of class imbalance, which hurts the classification performance for the tail class.
Supervised Machine Learning
;
Algorithms
2.Prediction of trends for fine-scale spread of Oncomelania hupensis in Shanghai Municipality based on supervised machine learning models.
Yan Feng GONG ; Zhuo Wei LUO ; Jia Xin FENG ; Jing Bo XUE ; Zhao Yu GUO ; Yan Jun JIN ; Qing YU ; Shang XIA ; Shan LÜ ; Jing XU ; Shi Zhu LI
Chinese Journal of Schistosomiasis Control 2022;34(3):241-251
OBJECTIVE:
To predict the trends for fine-scale spread of Oncomelania hupensis based on supervised machine learning models in Shanghai Municipality, so as to provide insights into precision O. hupensis snail control.
METHODS:
Based on 2016 O. hupensis snail survey data in Shanghai Municipality and climatic, geographical, vegetation and socioeconomic data relating to O. hupensis snail distribution, seven supervised machine learning models were created to predict the risk of snail spread in Shanghai, including decision tree, random forest, generalized boosted model, support vector machine, naive Bayes, k-nearest neighbor and C5.0. The performance of seven models for predicting snail spread was evaluated with the area under the receiver operating characteristic curve (AUC), F1-score and accuracy, and optimal models were selected to identify the environmental variables affecting snail spread and predict the areas at risk of snail spread in Shanghai Municipality.
RESULTS:
Seven supervised machine learning models were successfully created to predict the risk of snail spread in Shanghai Municipality, and random forest (AUC = 0.901, F1-score = 0.840, ACC = 0.797) and generalized boosted model (AUC= 0.889, F1-score = 0.869, ACC = 0.835) showed higher predictive performance than other models. Random forest analysis showed that the three most important climatic variables contributing to snail spread in Shanghai included aridity (11.87%), ≥ 0 °C annual accumulated temperature (10.19%), moisture index (10.18%) and average annual precipitation (9.86%), the two most important vegetation variables included the vegetation index of the first quarter (8.30%) and vegetation index of the second quarter (7.69%). Snails were more likely to spread at aridity of < 0.87, ≥ 0 °C annual accumulated temperature of 5 550 to 5 675 °C, moisture index of > 39% and average annual precipitation of > 1 180 mm, and with the vegetation index of the first quarter of > 0.4 and the vegetation index of the first quarter of > 0.6. According to the water resource developments and township administrative maps, the areas at risk of snail spread were mainly predicted in 10 townships/subdistricts, covering the Xipian, Dongpian and Tainan sections of southern Shanghai.
CONCLUSIONS
Supervised machine learning models are effective to predict the risk of fine-scale O. hupensis snail spread and identify the environmental determinants relating to snail spread. The areas at risk of O. hupensis snail spread are mainly located in southwestern Songjiang District, northwestern Jinshan District and southeastern Qingpu District of Shanghai Municipality.
Animals
;
Bayes Theorem
;
China/epidemiology*
;
Ecosystem
;
Gastropoda
;
Supervised Machine Learning
3.Arousal and Valence Classification Model Based on Long Short-Term Memory and DEAP Data for Mental Healthcare Management.
Eun Jeong CHOI ; Dong Keun KIM
Healthcare Informatics Research 2018;24(4):309-316
OBJECTIVES: Both the valence and arousal components of affect are important considerations when managing mental healthcare because they are associated with affective and physiological responses. Research on arousal and valence analysis, which uses images, texts, and physiological signals that employ deep learning, is actively underway; research investigating how to improve the recognition rate is needed. The goal of this research was to design a deep learning framework and model to classify arousal and valence, indicating positive and negative degrees of emotion as high or low. METHODS: The proposed arousal and valence classification model to analyze the affective state was tested using data from 40 channels provided by a dataset for emotion analysis using electrocardiography (EEG), physiological, and video signals (the DEAP dataset). Experiments were based on 10 selected featured central and peripheral nervous system data points, using long short-term memory (LSTM) as a deep learning method. RESULTS: The arousal and valence were classified and visualized on a two-dimensional coordinate plane. Profiles were designed depending on the number of hidden layers, nodes, and hyperparameters according to the error rate. The experimental results show an arousal and valence classification model accuracy of 74.65 and 78%, respectively. The proposed model performed better than previous other models. CONCLUSIONS: The proposed model appears to be effective in analyzing arousal and valence; specifically, it is expected that affective analysis using physiological signals based on LSTM will be possible without manual feature extraction. In a future study, the classification model will be adopted in mental healthcare management systems.
Arousal*
;
Classification*
;
Dataset
;
Delivery of Health Care*
;
Electrocardiography
;
Learning
;
Machine Learning
;
Memory, Short-Term*
;
Methods
;
Peripheral Nervous System
;
Supervised Machine Learning
4.Augmentation of Doppler Radar Data Using Generative Adversarial Network for Human Motion Analysis
Ibrahim ALNUJAIM ; Youngwook KIM
Healthcare Informatics Research 2019;25(4):344-349
OBJECTIVES: Human motion analysis can be applied to the diagnosis of musculoskeletal diseases, rehabilitation therapies, fall detection, and estimation of energy expenditure. To analyze human motion with micro-Doppler signatures measured by radar, a deep learning algorithm is one of the most effective approaches. Because deep learning requires a large data set, the high cost involved in measuring large amounts of human data is an intrinsic problem. The objective of this study is to augment human motion micro-Doppler data employing generative adversarial networks (GANs) to improve the accuracy of human motion classification. METHODS: To test data augmentation provided by GANs, authentic data for 7 human activities were collected using micro-Doppler radar. Each motion yielded 144 data samples. Software including GPU driver, CUDA library, cuDNN library, and Anaconda were installed to train the GANs. Keras-GPU, SciPy, Pillow, OpenCV, Matplotlib, and Git were used to create an Anaconda environment. The data produced by GANs were saved every 300 epochs, and the training was stopped at 3,000 epochs. The images generated from each epoch were evaluated, and the best images were selected. RESULTS: Each data set of the micro-Doppler signatures, consisting of 144 data samples, was augmented to produce 1,472 synthesized spectrograms of 64 × 64. Using the augmented spectrograms, the deep neural network was trained, increasing the accuracy of human motion classification. CONCLUSIONS: Data augmentation to increase the amount of training data was successfully conducted through the use of GANs. Thus, augmented micro-Doppler data can contribute to improving the accuracy of human motion recognition.
Boidae
;
Classification
;
Dataset
;
Diagnosis
;
Energy Metabolism
;
Human Activities
;
Humans
;
Learning
;
Motion Perception
;
Musculoskeletal Diseases
;
Rehabilitation
;
Supervised Machine Learning
5.Diagnosis and prediction of periodontally compromised teeth using a deep learning-based convolutional neural network algorithm
Jae Hong LEE ; Do hyung KIM ; Seong Nyum JEONG ; Seong Ho CHOI
Journal of Periodontal & Implant Science 2018;48(2):114-123
PURPOSE: The aim of the current study was to develop a computer-assisted detection system based on a deep convolutional neural network (CNN) algorithm and to evaluate the potential usefulness and accuracy of this system for the diagnosis and prediction of periodontally compromised teeth (PCT). METHODS: Combining pretrained deep CNN architecture and a self-trained network, periapical radiographic images were used to determine the optimal CNN algorithm and weights. The diagnostic and predictive accuracy, sensitivity, specificity, positive predictive value, negative predictive value, receiver operating characteristic (ROC) curve, area under the ROC curve, confusion matrix, and 95% confidence intervals (CIs) were calculated using our deep CNN algorithm, based on a Keras framework in Python. RESULTS: The periapical radiographic dataset was split into training (n=1,044), validation (n=348), and test (n=348) datasets. With the deep learning algorithm, the diagnostic accuracy for PCT was 81.0% for premolars and 76.7% for molars. Using 64 premolars and 64 molars that were clinically diagnosed as severe PCT, the accuracy of predicting extraction was 82.8% (95% CI, 70.1%–91.2%) for premolars and 73.4% (95% CI, 59.9%–84.0%) for molars. CONCLUSIONS: We demonstrated that the deep CNN algorithm was useful for assessing the diagnosis and predictability of PCT. Therefore, with further optimization of the PCT dataset and improvements in the algorithm, a computer-aided detection system can be expected to become an effective and efficient method of diagnosing and predicting PCT.
Area Under Curve
;
Artificial Intelligence
;
Bicuspid
;
Boidae
;
Dataset
;
Diagnosis
;
Learning
;
Machine Learning
;
Methods
;
Molar
;
Periodontal Diseases
;
ROC Curve
;
Sensitivity and Specificity
;
Supervised Machine Learning
;
Tooth
;
Weights and Measures
6.Heart Alert: A heart disease prediction system using machine learning approach and optimization techniques
Justin Allen P. Denopol ; Ma. Sheila A. Magboo ; Vincent Peter C. Magboo
Philippine Journal of Health Research and Development 2022;26(3):83-92
Background:
Cardiovascular diseases belong to the top three leading causes of mortality in the Philippines with 17.8 % of the total deaths. Lifestyle-related habits such as alcohol consumption, smoking, poor diet and nutrition, high sedentary behavior, overweight, and obesity have been increasingly implicated in the high rates of heart disease among Filipinos leading to a significant burden to the country's healthcare system. The objective of this study was to predict the presence of heart disease using various machine learning algorithms (support vector machine, naïve Bayes, random forest, logistic regression, decision tree, and adaptive boosting) evaluated on an anonymized publicly available cardiovascular disease dataset.
Methodology:
Various machine learning algorithms were applied on an anonymized publicly available
cardiovascular dataset from a machine learning data repository (IEEE Dataport). A web-based application
system named Heart Alert was developed based on the best machine learning model that would predict the risk of developing heart disease. An assessment of the effects of different optimization techniques as to the imputation methods (mean, median, mode, and multiple imputation by chained equations) and as to the feature selection method (recursive feature elimination) on the classification performance of the machine learning algorithms was made. All simulation experiments were implemented via Python 3.8 and its machine learning libraries (Scikit-learn, Keras, Tensorflow, Pandas, Matplotlib, Seaborn, NumPy).
Results:
The support vector machine without imputation and feature selection obtained the highest
performance metrics (90.2% accuracy, 87.7% sensitivity, 93.6% specificity, 94.9% precision, 91.2% F1-score and an area under the receiver operating characteristic curve of 0.902 ) and was used to implement the heart disease prediction system (Heart Alert). Following very closely were random forest with mean or median imputation and logistic regression with mode imputation, all having no feature selection which also performed well.
Conclusion
The performance of the best four machine learning models suggests that for this dataset,
imputation technique for missing values may or may not be done. Likewise, recursive feature elimination for feature selection may not apply as all variables seem to be important in heart disease prediction. An early accurate diagnosis leading to prompt intervention efforts is very crucial as it improves the patient's quality of life and diminishes the risk of developing cardiac events.
Machine Learning
;
Support Vector Machine
7.Population Pharmacokinetic and Pharmacodynamic Models of Propofol in Healthy Volunteers using NONMEM and Machine Learning Methods.
Yoo Mi KIM ; Sung Hong KANG ; Il Su PARK ; Gyu Jeong NOH
Journal of Korean Society of Medical Informatics 2008;14(2):147-159
OBJECTIVES: The primary objective of this study is to compare model performance of machine learning methods with that of a previous study in which a nonlinear mixed effects model was created using NONMEM(R) for the pharmacokinetic and pharmacodynamic data for propofol. The secondary objective was to evaluate if a pharmacodynamic model describing the relationship between the dose of propofol and bispectral index (BIS) outperform that describing the relationship between a pharmacokinetic model derived-predicted concentrations of propofol and BIS. METHODS: Data were collected during a study involving the infusion of propofol into healthy volunteers. Pharmacokinetic and pharmacodynamic models were constructed using artificial neural networks (ANNs), support vector machines (SVMs), and multi-method ensembles and were compared with the nonlinear mixed effects method as implemented by NONMEM(R). Model performance was assessed by goodness-of-fit statistics, paired t-tests between predicted and observed values for each model and scatterplots. RESULTS: In pharmacokinetic analysis, ensemble I, the mean of ANN and NONMEM(R) predictions, achieved minimal error and the highest correlation coefficient. SVM produced the highest error and the lowest correlation coefficient. In pharmacodynamic analysis, ANN exhibited the best performance. An ANNModel describing the relationship between the dose of propofol and BIS was not inferior to an ANN model describing the relationship between predicted concentrations of propofol derived from an ANN pharmacokinetic model and BIS. CONCLUSIONS: In pharmacokinetic analysis, ensemble combined with ANN achieved slightly better performance than NONMEM(R). The relationship between the dose of propofol and BIS can be predicted without considering pharmacokinetics of propofol.
Machine Learning
;
Propofol
;
Support Vector Machine
8.MicroRNA Target Prediction Based on Support Vector Machine Ensemble Classification Algorithm of Under-sampling Technique.
Journal of Biomedical Engineering 2016;33(1):72-77
Considering the low accuracy of prediction in the positive samples and poor overall classification effects caused by unbalanced sample data of MicroRNA (miRNA) target, we proposes a support vector machine (SVM)-integration of under-sampling and weight (IUSM) algorithm in this paper, an under-sampling based on the ensemble learning algorithm. The algorithm adopts SVM as learning algorithm and AdaBoost as integration framework, and embeds clustering-based under-sampling into the iterative process, aiming at reducing the degree of unbalanced distribution of positive and negative samples. Meanwhile, in the process of adaptive weight adjustment of the samples, the SVM-IUSM algorithm eliminates the abnormal ones in negative samples with robust sample weights smoothing mechanism so as to avoid over-learning. Finally, the prediction of miRNA target integrated classifier is achieved with the combination of multiple weak classifiers through the voting mechanism. The experiment revealed that the SVM-IUSW, compared with other algorithms on unbalanced dataset collection, could not only improve the accuracy of positive targets and the overall effect of classification, but also enhance the generalization ability of miRNA target classifier.
Algorithms
;
MicroRNAs
;
chemistry
;
Support Vector Machine
9.MicroRNA target predicition based on SVM and the optimized feature set.
Baowen WANG ; Xiaoyang QI ; Changwu WANG ; Wenyuan LIU ; Yali SI
Journal of Biomedical Engineering 2013;30(6):1213-1218
MicroRNA (miRNA) is a family of endogenous single-stranded RNA about 22 nucleotides in length. Through targeting 3' UTR of message RNA (mRNA), they play important roles in post-transcriptional regulatory functions. For further research of miRNA function, the identification of more miRNA positive targets is needed urgently. Aiming at the high-dimensional small sample data sets in miRNA target prediction, an algorithm of eliminating redundant features is proposed based on v-SVM in this paper, and classification and features selection are also fused. The algorithm of eliminating redundant features optimizes the combination of features, and then constructs the best features combination which can represent miRNA and targets interaction model. The prior parameter v (0 < u < or = 1) controls the compression proportion of data set and selects more distinguishing support vectors. Finally, the classifier model of miRNA target prediction is built. The unbiased assessment of the classifier is achieved with a completely independent test dataset. Experiment results indicated that in both classification recognition and generalization performance of miRNA targets predicition, this model was superior to the present machine learning algorithms such as miTarget, NBmiRTar and TargetMiner, etc.
MicroRNAs
;
Models, Theoretical
;
Support Vector Machine
10.Classification Model of Corneal Opacity Based on Digital Image Features.
Peng LUO ; Jilong ZHENG ; Peng ZHOU ; Yongde ZHANG ; Shijie CHANG ; Xianzheng SHA
Chinese Journal of Medical Instrumentation 2021;45(4):361-365
OBJECTIVE:
According to the digital image features of corneal opacity, a multi classification model of support vector machine (SVM) was established to explore the objective quantification method of corneal opacity.
METHODS:
The cornea digital images of dead pigs were collected, part of the color features and texture features were extracted according to the previous experience, and the SVM multi classification model was established. The test results of the model were evaluated by precision, sensitivity and
RESULTS:
In the classification of corneal opacity, the highest
CONCLUSIONS
The SVM multi classification model can classify the degree of corneal opacity.
Animals
;
Corneal Opacity
;
Support Vector Machine
;
Swine