SPNG+: A Stacking Ensemble Method to Predict Non-classical Secreted Proteins in Gram-positive Bacteria
10.13865/j.cnki.cjbmb.2021.05.1011
- Author:
Wei DAI
1
;
Jun-Wei XU
1
;
He-Jie WANG
1
;
Qi LI
1
Author Information
1. Bioinformatics Group, School of Computer Science and Information Security, Guilin University of Electronic Technology
- Publication Type:Journal Article
- Keywords:
Gram-positive bacteria;
machine learning;
non-classical secreted protein;
stacking
- From:
Chinese Journal of Biochemistry and Molecular Biology
2021;37(7):937-947
- CountryChina
- Language:Chinese
-
Abstract:
Gram-positive bacteria secrete virulence factors into host cells and cause suppurative inflammation, which leads to the emergence of diseases, therefore poses a great threat to human health. Identifying secreted proteins is beneficial to understand the secretion system and pathogenic mechanism of bacteria, and lays the foundation for further screening of pathogenic factors. Due to the lack of classical signal peptide sequence in non-classical secreted proteins, it is relatively difficult and time-consuming to identify such proteins in large-scale experiments. At present, some computational prediction methods have been proposed, but their performance in predicting non-classical secreted proteins of Gram-positive bacteria is not satisfactory. This paper proposed an ensemble learning model - SPNG+, which integrates six machine learning algorithms including naive bayes, random forest, support vector machine, two gradient promotion trees XGBoost and LightGBM, and K-nearest neighbor through stacking strategy. The results of 5-fold cross validation and independent dataset test show that the SPNG+ is superior to the single machine learning model, the simple integrated learning model and the existing prediction tools in predicting non-classical secreted proteins of Gram-positive bacteria. Compared with the predictors constructed by limited feature coding methods or single machine learning algorithms in the past, the proposed method is a useful supplement to the study of non-classical secreted proteins in Gram-positive bacteria. The source code of SPNG+ is available from https: / / github.com / weidai00 / SPNG.