Classification of multi-class homo-oligomer based on a novel method of feature extraction from protein primary structure.
- Author:
Shaowu ZHANG
1
;
Quan PAN
;
Chunhui ZHAO
;
Yongmei CHENG
Author Information
1. School of Automatic Control, Northwestern Polytechnic University, Xi'an 710072, China. zhangsw@nwpu.edu.cn
- Publication Type:Journal Article
- MeSH:
Algorithms;
Amino Acid Sequence;
Artificial Intelligence;
Cluster Analysis;
Humans;
Models, Molecular;
Molecular Sequence Data;
Protein Conformation;
Proteins;
chemistry;
classification;
Sequence Analysis, Protein;
methods
- From:
Journal of Biomedical Engineering
2007;24(4):721-726
- CountryChina
- Language:Chinese
-
Abstract:
A novel method of feature extraction from protein primary structure has been proposed and applied to classify the protein homodimer, homotrimer, homotetramer and homohexamer, i. e. one protein sequence can be represented by a feature vector composed of amino acid compositions and a set of weighted auto-correlation function factors of amino acid residue index. As a result, high classification accuracies are obtained. For example, with the same support vector machine (SVM), the total accuracies of QIANA, AIANB, MEEJ, ROBB and SNEP sets based on this novel feature extraction method are 77.63, 77.16, 76.46, 76.70 and 75.06% respectively in Jackknife test, which are 6.39, 5.92, 5.22, 5.46 and 3.82 percent points respectively higher than that of COMP set based on the conventional method composed of amino acid compositions. With the same QIANA set, the total accuracy of SVM is 77.63%, which is 16.29 percent points higher than that of covariant discriminant algorithm. These results show: (1) The novel feature extraction method is effective and feasible, and the feature vectors based on this method may contain more protein quaternary structure information and appear to capture essential information about the composition and hydrophobicity of residues in the surface patches buried in the interfaces of associated subunits; (2) SVM can be referred as a powerful computational tool for classifying the homo-oligomers of proteins.