A partition bagging ensemble learning algorithm for Parkinson's speech data mining.

Yongming LI; Cheng ZHANG; Pin WANG; Tingjie XIE; Xiaoping ZENG; Yanling ZHANG; Oumei CHENG; Fang YAN

Return

A partition bagging ensemble learning algorithm for Parkinson's speech data mining.

Author: Yongming LI ^{1
,

2} ; Cheng ZHANG ³ ; Pin WANG ³ ; Tingjie XIE ³ ; Xiaoping ZENG ³ ; Yanling ZHANG ⁴ ; Oumei CHENG ⁵ ; Fang YAN ³
Author Information

1. School of Microelectronics and Communication Engineering, Chongqing University, Chongqing 400044, P.R.China
2. Collaborative Innovation Center for Brain Science, Chongqing University, Chongqing 400044, P.R.China.yongmingli@cqu.edu.cn.
3. School of Microelectronics and Communication Engineering, Chongqing University, Chongqing 400044, P.R.China.
4. Department of Neurology, Southwest Hospital, Third Military Medical University, Chongqing 400038, P.R.China.
5. Department of Neurology, The First Affiliated Hospital, Chongqing Medical University, Chongqing 400016, P.R.China.
Publication Type:Journal Article
Keywords: Parkinson’s disease; classification; ensemble learning; partition bagging boosting mechanism; speech data
MeSH: Algorithms; Data Mining; Humans; Machine Learning; Parkinson Disease; diagnosis; Speech
From: Journal of Biomedical Engineering 2019;36(4):548-556
CountryChina
Language:Chinese
Abstract: Methods for achieving diagnosis of Parkinson's disease (PD) based on speech data mining have been proven effective in recent years. However, due to factors such as the degree of disease of the data collection subjects and the collection equipment and environment, there are different categories of sample aliasing in the sample space of the acquired data set. Samples in the aliased area are difficult to be identified effectively, which seriously affects the classification accuracy of the algorithm. In order to solve this problem, a partition bagging ensemble learning is proposed in this article, which measures the aliasing degree of the sample by designing the the ratio of sample centroid distance metrics and divides the training set into multiple subsets. And then the method of transfer training of misclassified samples is used to adjust the results of subset partitioning. Finally, the optimized weights of each sub-classifier are used to integrate the test results. The experimental results show that the classification accuracy of the proposed method is significantly improved on two public datasets and the increasement of mean accuracy is up to 25.44%. This method not only effectively improves the classification accuracy of PD speech dataset, but also increases the sample utilization rate, providing a new idea for the diagnosis of PD.