1.Semi-supervised Long-tail Endoscopic Image Classification.
Run-Nan CAO ; Meng-Jie FANG ; Hai-Ling LI ; Jie TIAN ; Di DONG
Chinese Medical Sciences Journal 2022;37(3):171-180
Objective To explore the semi-supervised learning (SSL) algorithm for long-tail endoscopic image classification with limited annotations. Method We explored semi-supervised long-tail endoscopic image classification in HyperKvasir, the largest gastrointestinal public dataset with 23 diverse classes. Semi-supervised learning algorithm FixMatch was applied based on consistency regularization and pseudo-labeling. After splitting the training dataset and the test dataset at a ratio of 4:1, we sampled 20%, 50%, and 100% labeled training data to test the classification with limited annotations. Results The classification performance was evaluated by micro-average and macro-average evaluation metrics, with the Mathews correlation coefficient (MCC) as the overall evaluation. SSL algorithm improved the classification performance, with MCC increasing from 0.8761 to 0.8850, from 0.8983 to 0.8994, and from 0.9075 to 0.9095 with 20%, 50%, and 100% ratio of labeled training data, respectively. With a 20% ratio of labeled training data, SSL improved both the micro-average and macro-average classification performance; while for the ratio of 50% and 100%, SSL improved the micro-average performance but hurt macro-average performance. Through analyzing the confusion matrix and labeling bias in each class, we found that the pseudo-based SSL algorithm exacerbated the classifier's preference for the head class, resulting in improved performance in the head class and degenerated performance in the tail class. Conclusion SSL can improve the classification performance for semi-supervised long-tail endoscopic image classification, especially when the labeled data is extremely limited, which may benefit the building of assisted diagnosis systems for low-volume hospitals. However, the pseudo-labeling strategy may amplify the effect of class imbalance, which hurts the classification performance for the tail class.
Supervised Machine Learning
;
Algorithms
2.Diagnosis and prediction of periodontally compromised teeth using a deep learning-based convolutional neural network algorithm
Jae Hong LEE ; Do hyung KIM ; Seong Nyum JEONG ; Seong Ho CHOI
Journal of Periodontal & Implant Science 2018;48(2):114-123
PURPOSE: The aim of the current study was to develop a computer-assisted detection system based on a deep convolutional neural network (CNN) algorithm and to evaluate the potential usefulness and accuracy of this system for the diagnosis and prediction of periodontally compromised teeth (PCT). METHODS: Combining pretrained deep CNN architecture and a self-trained network, periapical radiographic images were used to determine the optimal CNN algorithm and weights. The diagnostic and predictive accuracy, sensitivity, specificity, positive predictive value, negative predictive value, receiver operating characteristic (ROC) curve, area under the ROC curve, confusion matrix, and 95% confidence intervals (CIs) were calculated using our deep CNN algorithm, based on a Keras framework in Python. RESULTS: The periapical radiographic dataset was split into training (n=1,044), validation (n=348), and test (n=348) datasets. With the deep learning algorithm, the diagnostic accuracy for PCT was 81.0% for premolars and 76.7% for molars. Using 64 premolars and 64 molars that were clinically diagnosed as severe PCT, the accuracy of predicting extraction was 82.8% (95% CI, 70.1%–91.2%) for premolars and 73.4% (95% CI, 59.9%–84.0%) for molars. CONCLUSIONS: We demonstrated that the deep CNN algorithm was useful for assessing the diagnosis and predictability of PCT. Therefore, with further optimization of the PCT dataset and improvements in the algorithm, a computer-aided detection system can be expected to become an effective and efficient method of diagnosing and predicting PCT.
Area Under Curve
;
Artificial Intelligence
;
Bicuspid
;
Boidae
;
Dataset
;
Diagnosis
;
Learning
;
Machine Learning
;
Methods
;
Molar
;
Periodontal Diseases
;
ROC Curve
;
Sensitivity and Specificity
;
Supervised Machine Learning
;
Tooth
;
Weights and Measures
3.Central limit theorem: the cornerstone of modern statistics.
Korean Journal of Anesthesiology 2017;70(2):144-156
According to the central limit theorem, the means of a random sample of size, n, from a population with mean, µ, and variance, σ², distribute normally with mean, µ, and variance, σ²/n. Using the central limit theorem, a variety of parametric tests have been developed under assumptions about the parameters that determine the population probability distribution. Compared to non-parametric tests, which do not require any assumptions about the population probability distribution, parametric tests produce more accurate and precise estimates with higher statistical powers. However, many medical researchers use parametric tests to present their data without knowledge of the contribution of the central limit theorem to the development of such tests. Thus, this review presents the basic concepts of the central limit theorem and its role in binomial distributions and the Student's t-test, and provides an example of the sampling distributions of small populations. A proof of the central limit theorem is also described with the mathematical concepts required for its near-complete understanding.
Mathematical Concepts
;
Normal Distribution
;
Statistical Distributions
4.Anesthesia research in the artificial intelligence era.
Hyung Chul LEE ; Chul Woo JUNG
Anesthesia and Pain Medicine 2018;13(3):248-255
A noteworthy change in recent medical research is the rapid increase of research using big data obtained from electrical medical records (EMR), order communication systems (OCS), and picture archiving and communication systems (PACS). It is often difficult to apply traditional statistical techniques to research using big data because of the vastness of the data and complexity of the relationships. Therefore, the application of artificial intelligence (AI) techniques which can handle such problems is becoming popular. Classical machine learning techniques, such as k-means clustering, support vector machine, and decision tree are still efficient and useful for some research problems. The deep learning techniques, such as multi-layer perceptron, convolutional neural network, and recurrent neural network have been spotlighted by the success of deep belief networks and convolutional neural networks in solving various problems that are difficult to solve by conventional methods. The results of recent research using artificial intelligence techniques are comparable to human experts. This article introduces technologies that help researchers conduct medical research and understand previous literature in the era of AI.
Anesthesia*
;
Artificial Intelligence*
;
Decision Trees
;
Humans
;
Learning
;
Machine Learning
;
Medical Records
;
Neural Networks (Computer)
;
Radiology Information Systems
;
Support Vector Machine
5.Application of Deep Learning System into the Development of Communication Device for Quadriplegic Patient
Jung Hwan LEE ; Taewoo KANG ; Byung Kwan CHOI ; In Ho HAN ; Byung Chul KIM ; Jung Hoon RO
Korean Journal of Neurotrauma 2019;15(2):88-94
OBJECTIVE: In general, quadriplegic patients use their voices to call the caregiver. However, severe quadriplegic patients are in a state of tracheostomy, and cannot generate a voice. These patients require other communication tools to call caregivers. Recently, monitoring of eye status using artificial intelligence (AI) has been widely used in various fields. We made eye status monitoring system using deep learning, and developed a communication system for quadriplegic patients can call the caregiver. METHODS: The communication system consists of 3 programs. The first program was developed for automatic capturing of eye images from the face using a webcam. It continuously captured and stored 15 eye images per second. Secondly, the captured eye images were evaluated for open or closed status by deep learning, which is a type of AI. Google TensorFlow was used as a machine learning tool or library for convolutional neural network. A total of 18,000 images were used to train deep learning system. Finally, the program was developed to utter a sound when the left eye was closed for 3 seconds. RESULTS: The test accuracy of eye status was 98.7%. In practice, when the quadriplegic patient looked at the webcam and closed his left eye for 3 seconds, the sound for calling a caregiver was generated. CONCLUSION: Our eye status detection software using AI is very accurate, and the calling system for the quadriplegic patient was satisfactory.
Artificial Intelligence
;
Caregivers
;
Humans
;
Learning
;
Machine Learning
;
Quadriplegia
;
Tracheostomy
;
Unsupervised Machine Learning
;
Voice
6.Comparison of Models for the Prediction of Medical Costs of Spinal Fusion in Taiwan Diagnosis-Related Groups by Machine Learning Algorithms
Ching Yen KUO ; Liang Chin YU ; Hou Chaung CHEN ; Chien Lung CHAN
Healthcare Informatics Research 2018;24(1):29-37
OBJECTIVES: The aims of this study were to compare the performance of machine learning methods for the prediction of the medical costs associated with spinal fusion in terms of profit or loss in Taiwan Diagnosis-Related Groups (Tw-DRGs) and to apply these methods to explore the important factors associated with the medical costs of spinal fusion. METHODS: A data set was obtained from a regional hospital in Taoyuan city in Taiwan, which contained data from 2010 to 2013 on patients of Tw-DRG49702 (posterior and other spinal fusion without complications or comorbidities). Naïve-Bayesian, support vector machines, logistic regression, C4.5 decision tree, and random forest methods were employed for prediction using WEKA 3.8.1. RESULTS: Five hundred thirty-two cases were categorized as belonging to the Tw-DRG49702 group. The mean medical cost was US $4,549.7, and the mean age of the patients was 62.4 years. The mean length of stay was 9.3 days. The length of stay was an important variable in terms of determining medical costs for patients undergoing spinal fusion. The random forest method had the best predictive performance in comparison to the other methods, achieving an accuracy of 84.30%, a sensitivity of 71.4%, a specificity of 92.2%, and an AUC of 0.904. CONCLUSIONS: Our study demonstrated that the random forest model can be employed to predict the medical costs of Tw-DRG49702, and could inform hospital strategy in terms of increasing the financial management efficiency of this operation.
Area Under Curve
;
Costs and Cost Analysis
;
Dataset
;
Decision Trees
;
Diagnosis-Related Groups
;
Financial Management
;
Forests
;
Humans
;
Length of Stay
;
Logistic Models
;
Machine Learning
;
Methods
;
Sensitivity and Specificity
;
Spinal Fusion
;
Support Vector Machine
;
Taiwan
7.Prediction of trends for fine-scale spread of Oncomelania hupensis in Shanghai Municipality based on supervised machine learning models.
Yan Feng GONG ; Zhuo Wei LUO ; Jia Xin FENG ; Jing Bo XUE ; Zhao Yu GUO ; Yan Jun JIN ; Qing YU ; Shang XIA ; Shan LÜ ; Jing XU ; Shi Zhu LI
Chinese Journal of Schistosomiasis Control 2022;34(3):241-251
OBJECTIVE:
To predict the trends for fine-scale spread of Oncomelania hupensis based on supervised machine learning models in Shanghai Municipality, so as to provide insights into precision O. hupensis snail control.
METHODS:
Based on 2016 O. hupensis snail survey data in Shanghai Municipality and climatic, geographical, vegetation and socioeconomic data relating to O. hupensis snail distribution, seven supervised machine learning models were created to predict the risk of snail spread in Shanghai, including decision tree, random forest, generalized boosted model, support vector machine, naive Bayes, k-nearest neighbor and C5.0. The performance of seven models for predicting snail spread was evaluated with the area under the receiver operating characteristic curve (AUC), F1-score and accuracy, and optimal models were selected to identify the environmental variables affecting snail spread and predict the areas at risk of snail spread in Shanghai Municipality.
RESULTS:
Seven supervised machine learning models were successfully created to predict the risk of snail spread in Shanghai Municipality, and random forest (AUC = 0.901, F1-score = 0.840, ACC = 0.797) and generalized boosted model (AUC= 0.889, F1-score = 0.869, ACC = 0.835) showed higher predictive performance than other models. Random forest analysis showed that the three most important climatic variables contributing to snail spread in Shanghai included aridity (11.87%), ≥ 0 °C annual accumulated temperature (10.19%), moisture index (10.18%) and average annual precipitation (9.86%), the two most important vegetation variables included the vegetation index of the first quarter (8.30%) and vegetation index of the second quarter (7.69%). Snails were more likely to spread at aridity of < 0.87, ≥ 0 °C annual accumulated temperature of 5 550 to 5 675 °C, moisture index of > 39% and average annual precipitation of > 1 180 mm, and with the vegetation index of the first quarter of > 0.4 and the vegetation index of the first quarter of > 0.6. According to the water resource developments and township administrative maps, the areas at risk of snail spread were mainly predicted in 10 townships/subdistricts, covering the Xipian, Dongpian and Tainan sections of southern Shanghai.
CONCLUSIONS
Supervised machine learning models are effective to predict the risk of fine-scale O. hupensis snail spread and identify the environmental determinants relating to snail spread. The areas at risk of O. hupensis snail spread are mainly located in southwestern Songjiang District, northwestern Jinshan District and southeastern Qingpu District of Shanghai Municipality.
Animals
;
Bayes Theorem
;
China/epidemiology*
;
Ecosystem
;
Gastropoda
;
Supervised Machine Learning
8.A Comparison of Intensive Care Unit Mortality Prediction Models through the Use of Data Mining Techniques.
Sujin KIM ; Woojae KIM ; Rae Woong PARK
Healthcare Informatics Research 2011;17(4):232-243
OBJECTIVES: The intensive care environment generates a wealth of critical care data suited to developing a well-calibrated prediction tool. This study was done to develop an intensive care unit (ICU) mortality prediction model built on University of Kentucky Hospital (UKH)'s data and to assess whether the performance of various data mining techniques, such as the artificial neural network (ANN), support vector machine (SVM) and decision trees (DT), outperform the conventional logistic regression (LR) statistical model. METHODS: The models were built on ICU data collected regarding 38,474 admissions to the UKH between January 1998 and September 2007. The first 24 hours of the ICU admission data were used, including patient demographics, admission information, physiology data, chronic health items, and outcome information. RESULTS: Only 15 study variables were identified as significant for inclusion in the model development. The DT algorithm slightly outperformed (AUC, 0.892) the other data mining techniques, followed by the ANN (AUC, 0.874), and SVM (AUC, 0.876), compared to that of the APACHE III performance (AUC, 0.871). CONCLUSIONS: With fewer variables needed, the machine learning algorithms that we developed were proven to be as good as the conventional APACHE III prediction.
APACHE
;
Critical Care
;
Data Mining
;
Decision Trees
;
Demography
;
Humans
;
Intensive Care Units
;
Kentucky
;
Logistic Models
;
Machine Learning
;
Support Vector Machine
9.Prediction of Return-to-original-work after an Industrial Accident Using Machine Learning and Comparison of Techniques.
Journal of Korean Medical Science 2018;33(19):e144-
BACKGROUND: Many studies have tried to develop predictors for return-to-work (RTW). However, since complex factors have been demonstrated to predict RTW, it is difficult to use them practically. This study investigated whether factors used in previous studies could predict whether an individual had returned to his/her original work by four years after termination of the worker's recovery period. METHODS: An initial logistic regression analysis of 1,567 participants of the fourth Panel Study of Worker's Compensation Insurance yielded odds ratios. The participants were divided into two subsets, a training dataset and a test dataset. Using the training dataset, logistic regression, decision tree, random forest, and support vector machine models were established, and important variables of each model were identified. The predictive abilities of the different models were compared. RESULTS: The analysis showed that only earned income and company-related factors significantly affected return-to-original-work (RTOW). The random forest model showed the best accuracy among the tested machine learning models; however, the difference was not prominent. CONCLUSION: It is possible to predict a worker's probability of RTOW using machine learning techniques with moderate accuracy.
Accidents, Occupational*
;
Dataset
;
Decision Trees
;
Forests
;
Insurance
;
Logistic Models
;
Machine Learning*
;
Odds Ratio
;
Return to Work
;
Support Vector Machine
;
Workers' Compensation
10.Quadrature Doppler ultrasound signal denoising based on adapted local cosine transform.
Xiaotao WANG ; Yi SHEN ; Zhiyan LIU
Journal of Biomedical Engineering 2006;23(5):1114-1117
The spectrogram of Doppler ultrasound signal has been widely used in clinical diagnosis. The additional frequency components arising from internal or external noise to the system will produce adverse effects on its subjective and quantitative analysis. A novel approach based on the adapted local cosine transform and the non-negative Garrote thresholding method was proposed to remove noise from quadrature Doppler signal. At first, the directional information was extracted from the quadrature signal. And then the denoising method based on the adapted local cosine transform is performed on the forward and backward flow signals, respectively. At last, the estimated signal was reconstructed from the denoised signals using Hilbert transform. In the simulation study, both the mean frequency and spectral width waveform were studied for the denoised signal. The simulation results had shown that this approach was superior to that based on the wavelet transform, especially under low SNR conditions.
Algorithms
;
Artifacts
;
Computer Simulation
;
Fourier Analysis
;
Humans
;
Nonlinear Dynamics
;
Signal Processing, Computer-Assisted
;
Stochastic Processes
;
Ultrasonics