Prediction of outer membrane proteins using support vector machine with combined features.
- Author:
Lingyun ZOU
1
;
Zhengzhi WANG
;
Yongxian WANG
Author Information
1. School of Mechatronics and Automatization, National University of Defence Technology, Changsha 410073, China. lyzou@nudt.edu.cn
- Publication Type:Journal Article
- MeSH:
Algorithms;
Amino Acids;
chemistry;
Bacterial Outer Membrane Proteins;
chemistry;
genetics;
Computational Biology;
methods;
Discriminant Analysis;
Genome, Bacterial;
genetics;
Gram-Negative Bacteria;
genetics;
Models, Statistical;
Protein Structure, Secondary;
Protein Structure, Tertiary
- From:
Chinese Journal of Biotechnology
2008;24(4):651-658
- CountryChina
- Language:Chinese
-
Abstract:
Outer membrane proteins (OMPs) are embedded in the outer membrane of Gram-negative bacteria, mitochondria, and chloroplasts. The cellular location and functional diversity of OMPs makes them an important protein class. Researches on prediction of OMPs by bioinformatics methods can bring helpful methodologies for identifying OMPs from genomic sequences and for the successful prediction of their secondary and tertiary structures. In this paper, three feature classes were calculated from protein sequences: amino acid compositions, dipeptide compositions and weighted amino acid index correlation coefficients. Then, three feature classes were combined and inputted into a support vector machine (SVM) based predictor to identify OMPs from other folding types of proteins. The results of discrimination using several combined features including four amino acid index categories were calculated, and the influence on discrimination accuracy using different correlation coefficients with different orders and weights was discussed. In cross-validated tests and independent tests for identifying OMPs from a dataset of 1087 proteins belonging to all different types of globular and membrane proteins, the method using combined features obtains an overall accuracy of 96.96% and 97.33% respectively. And these results outperform that of other methods in the literature. Using this method, high specificities are shown from the results of identifying OMPs in five bacterial genomes, and over 99% OMPs with known three-dimensional structures in the PDB database are correctly discriminated. These results indicate that the method is a powerful tool for OMPs discrimination in genomes.