1.Research on eye movement data classification using support vector machine with improved whale optimization algorithm.
Yinhong SHEN ; Chang ZHANG ; Lin YANG ; Yuanyuan LI ; Xiujuan ZHENG
Journal of Biomedical Engineering 2023;40(2):335-342
When performing eye movement pattern classification for different tasks, support vector machines are greatly affected by parameters. To address this problem, we propose an algorithm based on the improved whale algorithm to optimize support vector machines to enhance the performance of eye movement data classification. According to the characteristics of eye movement data, this study first extracts 57 features related to fixation and saccade, then uses the ReliefF algorithm for feature selection. To address the problems of low convergence accuracy and easy falling into local minima of the whale algorithm, we introduce inertia weights to balance local search and global search to accelerate the convergence speed of the algorithm and also use the differential variation strategy to increase individual diversity to jump out of local optimum. In this paper, experiments are conducted on eight test functions, and the results show that the improved whale algorithm has the best convergence accuracy and convergence speed. Finally, this paper applies the optimized support vector machine model of the improved whale algorithm to the task of classifying eye movement data in autism, and the experimental results on the public dataset show that the accuracy of the eye movement data classification of this paper is greatly improved compared with that of the traditional support vector machine method. Compared with the standard whale algorithm and other optimization algorithms, the optimized model proposed in this paper has higher recognition accuracy and provides a new idea and method for eye movement pattern recognition. In the future, eye movement data can be obtained by combining it with eye trackers to assist in medical diagnosis.
Animals
;
Support Vector Machine
;
Whales
;
Eye Movements
;
Algorithms
2.An Atrial Fibrillation Classification Method Study Based on BP Neural Network and SVM.
Chenqin LIU ; Gaozang LIN ; Jingjing ZHOU ; Jilun YE ; Xu ZHANG
Chinese Journal of Medical Instrumentation 2023;47(3):258-263
Atrial fibrillation is a common arrhythmia, and its diagnosis is interfered by many factors. In order to achieve applicability in diagnosis and improve the level of automatic analysis of atrial fibrillation to the level of experts, the automatic detection of atrial fibrillation is very important. This study proposes an automatic detection algorithm for atrial fibrillation based on BP neural network (back propagation network) and support vector machine (SVM). The electrocardiogram (ECG) segments in the MIT-BIH atrial fibrillation database are divided into 10, 32, 64, and 128 heartbeats, respectively, and the Lorentz value, Shannon entropy, K-S test value and exponential moving average value are calculated. These four characteristic parameters are used as the input of SVM and BP neural network for classification and testing, and the label given by experts in the MIT-BIH atrial fibrillation database is used as the reference output. Among them, the use of atrial fibrillation in the MIT-BIH database, the first 18 cases of data are used as the training set, and the last 7 cases of data are used as the test set. The results show that the accuracy rate of 92% is obtained in the classification of 10 heartbeats, and the accuracy rate of 98% is obtained in the latter three categories. The sensitivity and specificity are both above 97.7%, which has certain applicability. Further validation and improvement in clinical ECG data will be done in next study.
Humans
;
Atrial Fibrillation/diagnosis*
;
Support Vector Machine
;
Heart Rate
;
Algorithms
;
Neural Networks, Computer
;
Electrocardiography
3.Origin identification of Poria cocos based on hyperspectral imaging technology.
Xue SUN ; Deng-Ting ZHANG ; Hui WANG ; Cong ZHOU ; Jian YANG ; Dai-Yin PENG ; Xiao-Bo ZHANG
China Journal of Chinese Materia Medica 2023;48(16):4337-4346
To realize the non-destructive and rapid origin discrimination of Poria cocos in batches, this study established the P. cocos origin recognition model based on hyperspectral imaging combined with machine learning. P. cocos samples from Anhui, Fujian, Guangxi, Hubei, Hunan, Henan and Yunnan were used as the research objects. Hyperspectral data were collected in the visible and near infrared band(V-band, 410-990 nm) and shortwave infrared band(S-band, 950-2 500 nm). The original spectral data were divided into S-band, V-band and full-band. With the original data(RD) of different bands, multiplicative scatter correction(MSC), standard normal variation(SNV), S-G smoothing(SGS), first derivative(FD), second derivative(SD) and other pretreatments were carried out. Then the data were classified according to three different types of producing areas: province, county and batch. The origin identification model was established by partial least squares discriminant analysis(PLS-DA) and linear support vector machine(LinearSVC). Finally, confusion matrix was employed to evaluate the optimal model, with F1 score as the evaluation standard. The results revealed that the origin identification model established by FD combined with LinearSVC had the highest prediction accuracy in full-band range classified by province, V-band range by county and full-band range by batch, which were 99.28%, 98.55% and 97.45%, respectively, and the overall F1 scores of these three models were 99.16%, 98.59% and 97.58%, respectively, indicating excellent performance of these models. Therefore, hyperspectral imaging combined with LinearSVC can realize the non-destructive, accurate and rapid identification of P. cocos from different producing areas in batches, which is conducive to the directional research and production of P. cocos.
Hyperspectral Imaging
;
Wolfiporia
;
China
;
Least-Squares Analysis
;
Support Vector Machine
4.Detection method of early heart valve diseases based on heart sound features.
Chengfa SUN ; Xinpei WANG ; Changchun LIU
Journal of Biomedical Engineering 2023;40(6):1160-1167
Heart valve disease (HVD) is one of the common cardiovascular diseases. Heart sound is an important physiological signal for diagnosing HVDs. This paper proposed a model based on combination of basic component features and envelope autocorrelation features to detect early HVDs. Initially, heart sound signals lasting 5 minutes were denoised by empirical mode decomposition (EMD) algorithm and segmented. Then the basic component features and envelope autocorrelation features of heart sound segments were extracted to construct heart sound feature set. Then the max-relevance and min-redundancy (MRMR) algorithm was utilized to select the optimal mixed feature subset. Finally, decision tree, support vector machine (SVM) and k-nearest neighbor (KNN) classifiers were trained to detect the early HVDs from the normal heart sounds and obtained the best accuracy of 99.9% in clinical database. Normal valve, abnormal semilunar valve and abnormal atrioventricular valve heart sounds were classified and the best accuracy was 99.8%. Moreover, normal valve, single-valve abnormal and multi-valve abnormal heart sounds were classified and the best accuracy was 98.2%. In public database, this method also obtained the good overall accuracy. The result demonstrated this proposed method had important value for the clinical diagnosis of early HVDs.
Humans
;
Heart Sounds
;
Heart Valve Diseases/diagnosis*
;
Algorithms
;
Support Vector Machine
;
Signal Processing, Computer-Assisted
5.A preliminary prediction model of depression based on whole blood cell count by machine learning method.
Jing YAN ; Xin Yuan LI ; Yu Lan GENG ; Yu Fang LIANG ; Chao CHEN ; Ze Wen HAN ; Rui ZHOU
Chinese Journal of Preventive Medicine 2023;57(11):1862-1868
This study used machine learning techniques combined with routine blood cell analysis parameters to build preliminary prediction models, helping differentiate patients with depression from healthy controls, or patients with anxiety. A multicenter study was performed by collecting blood cell analysis data of Beijing Chaoyang Hospital and the First Hospital of Hebei Medical University from 2020 to 2021. Machine learning techniques, including support vector machine, decision tree, naïve Bayes, random forest and multi-layer perceptron were explored to establish a prediction model of depression. The results showed that based on the blood cell analysis results of healthy controls and depression group, the accuracy of prediction model reached as high as 0.99, F1 was 0.975. Receiver operating characteristic curve area and average accuracy were 0.985 and 0.967, respectively. Platelet parameters contributed mostly to depression prediction model. While, to random forest differential diagnosis model based on the data from depression and anxiety groups, prediction accuracy reached 0.68 and AUC 0.622. Age, platelet parameters, and average volume of red blood cells contributed the most to the model. In conclusion, the study researched on the prediction model of depression by exploring blood cell analysis parameters, revealing that machine learning models were more objective in the evaluation of mental illness.
Humans
;
Depression
;
Bayes Theorem
;
Machine Learning
;
Support Vector Machine
;
Blood Cell Count
6.A preliminary prediction model of depression based on whole blood cell count by machine learning method.
Jing YAN ; Xin Yuan LI ; Yu Lan GENG ; Yu Fang LIANG ; Chao CHEN ; Ze Wen HAN ; Rui ZHOU
Chinese Journal of Preventive Medicine 2023;57(11):1862-1868
This study used machine learning techniques combined with routine blood cell analysis parameters to build preliminary prediction models, helping differentiate patients with depression from healthy controls, or patients with anxiety. A multicenter study was performed by collecting blood cell analysis data of Beijing Chaoyang Hospital and the First Hospital of Hebei Medical University from 2020 to 2021. Machine learning techniques, including support vector machine, decision tree, naïve Bayes, random forest and multi-layer perceptron were explored to establish a prediction model of depression. The results showed that based on the blood cell analysis results of healthy controls and depression group, the accuracy of prediction model reached as high as 0.99, F1 was 0.975. Receiver operating characteristic curve area and average accuracy were 0.985 and 0.967, respectively. Platelet parameters contributed mostly to depression prediction model. While, to random forest differential diagnosis model based on the data from depression and anxiety groups, prediction accuracy reached 0.68 and AUC 0.622. Age, platelet parameters, and average volume of red blood cells contributed the most to the model. In conclusion, the study researched on the prediction model of depression by exploring blood cell analysis parameters, revealing that machine learning models were more objective in the evaluation of mental illness.
Humans
;
Depression
;
Bayes Theorem
;
Machine Learning
;
Support Vector Machine
;
Blood Cell Count
7.Rapid identification of geographic origins of Zingiberis Rhizoma by NIRS combined with chemometrics and machine learning algorithms.
Dai-Xin YU ; Sheng GUO ; Xia ZHANG ; Hui YAN ; Zhen-Yu ZHANG ; Hai-Yang LI ; Jian YANG ; Jin-Ao DUAN
China Journal of Chinese Materia Medica 2022;47(17):4583-4592
In this study, 280 batches of Zingiberis Rhizoma samples from nine producing areas were analyzed to obtain infrared spectral information based on near-infrared spectroscopy(NIRS). Pluralistic chemometrics such as principal component analysis(PCA), partial least squares-discriminant analysis(PLS-DA), orthogonal partial least squares-discriminant analysis(OPLS-DA), K-nearest neighbors(KNN), support vector machine(SVM), random forest(RF), artificial neural network(ANN), and gradient boosting(GB) were applied for tracing of origins. The results showed that the discriminative accuracy of the spectral preprocessing by standard normal variate transformation coupled with the first derivative was 93.9%, which could be used for the construction of the discrimination model. PCA and PLS-DA score plots showed that samples from Shandong, Sichuan, Yunnan, and Guizhou could be effectively distinguished, but the remaining samples were partially overlapped. As revealed by the analysis results by machine learning algorithms, the AUC values of KNN, SVM, RF, ANN, and GB algorithms were 0.96, 0.99, 0.99, 0.99, and 0.98, respectively, with overall prediction accuracies of 83.3%, 89.3%, 90.5%, 91.7%, and 89.3%. It indicated that the developed model was reliable and the machine learning algorithm combined with NIRS for origin identification was sufficiently feasible. OPLS-DA showed that Zingiberis Rhizoma from Sichuan(genuine producing areas) could be significantly distinguished from other regions, with good discriminative accuracy, suggesting that the NIRS established in this study combined with chemometrics can be used for the identification of Zingiberis Rhizoma from Sichuan. This study established a rapid and nondestructive identification and reliable data analysis method for origin identification of Zingiberis Rhizoma, which is expected to provide a new idea for the origin tracing of Chinese medicinal materials.
Algorithms
;
Chemometrics
;
China
;
Ginger
;
Least-Squares Analysis
;
Plant Extracts
;
Principal Component Analysis
;
Support Vector Machine
8.ST segment morphological classification based on support vector machine multi feature fusion.
Haiman DU ; Ting BIAN ; Peng XIONG ; Jianli YANG ; Jieshuo ZHANG ; Xiuling LIU
Journal of Biomedical Engineering 2022;39(4):702-712
ST segment morphology is closely related to cardiovascular disease. It is used not only for characterizing different diseases, but also for predicting the severity of the disease. However, the short duration, low energy, variable morphology and interference from various noises make ST segment morphology classification a difficult task. In this paper, we address the problems of single feature extraction and low classification accuracy of ST segment morphology classification, and use the gradient of ST surface to improve the accuracy of ST segment morphology multi-classification. In this paper, we identify five ST segment morphologies: normal, upward-sloping elevation, arch-back elevation, horizontal depression, and arch-back depression. Firstly, we select an ST segment candidate segment according to the QRS wave group location and medical statistical law. Secondly, we extract ST segment area, mean value, difference with reference baseline, slope, and mean squared error features. In addition, the ST segment is converted into a surface, the gradient features of the ST surface are extracted, and the morphological features are formed into a feature vector. Finally, the support vector machine is used to classify the ST segment, and then the ST segment morphology is multi-classified. The MIT-Beth Israel Hospital Database (MITDB) and the European ST-T database (EDB) were used as data sources to validate the algorithm in this paper, and the results showed that the algorithm in this paper achieved an average recognition rate of 97.79% and 95.60%, respectively, in the process of ST segment recognition. Based on the results of this paper, it is expected that this method can be introduced in the clinical setting in the future to provide morphological guidance for the diagnosis of cardiovascular diseases in the clinic and improve the diagnostic efficiency.
Algorithms
;
Arrhythmias, Cardiac
;
Databases, Factual
;
Electrocardiography/methods*
;
Humans
;
Support Vector Machine
9.Semi-supervised Long-tail Endoscopic Image Classification.
Run-Nan CAO ; Meng-Jie FANG ; Hai-Ling LI ; Jie TIAN ; Di DONG
Chinese Medical Sciences Journal 2022;37(3):171-180
Objective To explore the semi-supervised learning (SSL) algorithm for long-tail endoscopic image classification with limited annotations. Method We explored semi-supervised long-tail endoscopic image classification in HyperKvasir, the largest gastrointestinal public dataset with 23 diverse classes. Semi-supervised learning algorithm FixMatch was applied based on consistency regularization and pseudo-labeling. After splitting the training dataset and the test dataset at a ratio of 4:1, we sampled 20%, 50%, and 100% labeled training data to test the classification with limited annotations. Results The classification performance was evaluated by micro-average and macro-average evaluation metrics, with the Mathews correlation coefficient (MCC) as the overall evaluation. SSL algorithm improved the classification performance, with MCC increasing from 0.8761 to 0.8850, from 0.8983 to 0.8994, and from 0.9075 to 0.9095 with 20%, 50%, and 100% ratio of labeled training data, respectively. With a 20% ratio of labeled training data, SSL improved both the micro-average and macro-average classification performance; while for the ratio of 50% and 100%, SSL improved the micro-average performance but hurt macro-average performance. Through analyzing the confusion matrix and labeling bias in each class, we found that the pseudo-based SSL algorithm exacerbated the classifier's preference for the head class, resulting in improved performance in the head class and degenerated performance in the tail class. Conclusion SSL can improve the classification performance for semi-supervised long-tail endoscopic image classification, especially when the labeled data is extremely limited, which may benefit the building of assisted diagnosis systems for low-volume hospitals. However, the pseudo-labeling strategy may amplify the effect of class imbalance, which hurts the classification performance for the tail class.
Supervised Machine Learning
;
Algorithms
10.Prediction of trends for fine-scale spread of Oncomelania hupensis in Shanghai Municipality based on supervised machine learning models.
Yan Feng GONG ; Zhuo Wei LUO ; Jia Xin FENG ; Jing Bo XUE ; Zhao Yu GUO ; Yan Jun JIN ; Qing YU ; Shang XIA ; Shan LÜ ; Jing XU ; Shi Zhu LI
Chinese Journal of Schistosomiasis Control 2022;34(3):241-251
OBJECTIVE:
To predict the trends for fine-scale spread of Oncomelania hupensis based on supervised machine learning models in Shanghai Municipality, so as to provide insights into precision O. hupensis snail control.
METHODS:
Based on 2016 O. hupensis snail survey data in Shanghai Municipality and climatic, geographical, vegetation and socioeconomic data relating to O. hupensis snail distribution, seven supervised machine learning models were created to predict the risk of snail spread in Shanghai, including decision tree, random forest, generalized boosted model, support vector machine, naive Bayes, k-nearest neighbor and C5.0. The performance of seven models for predicting snail spread was evaluated with the area under the receiver operating characteristic curve (AUC), F1-score and accuracy, and optimal models were selected to identify the environmental variables affecting snail spread and predict the areas at risk of snail spread in Shanghai Municipality.
RESULTS:
Seven supervised machine learning models were successfully created to predict the risk of snail spread in Shanghai Municipality, and random forest (AUC = 0.901, F1-score = 0.840, ACC = 0.797) and generalized boosted model (AUC= 0.889, F1-score = 0.869, ACC = 0.835) showed higher predictive performance than other models. Random forest analysis showed that the three most important climatic variables contributing to snail spread in Shanghai included aridity (11.87%), ≥ 0 °C annual accumulated temperature (10.19%), moisture index (10.18%) and average annual precipitation (9.86%), the two most important vegetation variables included the vegetation index of the first quarter (8.30%) and vegetation index of the second quarter (7.69%). Snails were more likely to spread at aridity of < 0.87, ≥ 0 °C annual accumulated temperature of 5 550 to 5 675 °C, moisture index of > 39% and average annual precipitation of > 1 180 mm, and with the vegetation index of the first quarter of > 0.4 and the vegetation index of the first quarter of > 0.6. According to the water resource developments and township administrative maps, the areas at risk of snail spread were mainly predicted in 10 townships/subdistricts, covering the Xipian, Dongpian and Tainan sections of southern Shanghai.
CONCLUSIONS
Supervised machine learning models are effective to predict the risk of fine-scale O. hupensis snail spread and identify the environmental determinants relating to snail spread. The areas at risk of O. hupensis snail spread are mainly located in southwestern Songjiang District, northwestern Jinshan District and southeastern Qingpu District of Shanghai Municipality.
Animals
;
Bayes Theorem
;
China/epidemiology*
;
Ecosystem
;
Gastropoda
;
Supervised Machine Learning

Result Analysis
Print
Save
E-mail