SuccSite:Incorporating Amino Acid Composition and Informative k-spaced Amino Acid Pairs to Identify Protein Succinylation Sites
- Author:
Kao HUI-JU
1
;
Nguyen VAN-NUI
;
Huang KAI-YAO
;
Chang WEN-CHI
;
Lee TZONG-YI
Author Information
1. Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 32003, Taiwan, China
- Keywords:
Protein succinylation;
Succinyl group;
Substrate specificity;
Amino acid composition;
k-spaced amino acid pair composition
- From:
Genomics, Proteomics & Bioinformatics
2020;18(2):208-219
- CountryChina
- Language:Chinese
-
Abstract:
Protein succinylation is a biochemical reaction in which a succinyl group (-CO-CH2-CH2-CO-) is attached to the lysine residue of a protein molecule. Lysine succinylation plays important regulatory roles in living cells. However, studies in this field are limited by the difficulty in experi-mentally identifying the substrate site specificity of lysine succinylation. To facilitate this process, several tools have been proposed for the computational identification of succinylated lysine sites. In this study, we developed an approach to investigate the substrate specificity of lysine succinylated sites based on amino acid composition. Using experimentally verified lysine succinylated sites col-lected from public resources, the significant differences in position-specific amino acid composition between succinylated and non-succinylated sites were represented using the Two Sample Logo pro-gram. These findings enabled the adoption of an effective machine learning method, support vector machine, to train a predictive model with not only the amino acid composition, but also the com-position of k-spaced amino acid pairs. After the selection of the best model using a ten-fold cross-validation approach, the selected model significantly outperformed existing tools based on an inde-pendent dataset manually extracted from published research articles. Finally, the selected model was used to develop a web-based tool, SuccSite, to aid the study of protein succinylation. Two pro-teins were used as case studies on the website to demonstrate the effective prediction of succinyla-tion sites. We will regularly update SuccSite by integrating more experimental datasets. SuccSite is freely accessible at http://csb.cse.yzu.edu.tw/SuccSite/.