1.GPS 5.0:An Update on the Prediction of Kinase-specific Phosphorylation Sites in Proteins
Wang CHENWEI ; Xu HAODONG ; Lin SHAOFENG ; Deng WANKUN ; Zhou JIAQI ; Zhang YING ; Shi YING ; Peng DI ; Xue YU
Genomics, Proteomics & Bioinformatics 2020;18(1):72-80
In eukaryotes, protein phosphorylation is specifically catalyzed by numerous protein kinases (PKs), faithfully orchestrates various biological processes, and reversibly determines cellular dynamics and plasticity. Here we report an updated algorithm of Group-based Prediction System (GPS) 5.0 to improve the performance for predicting kinase-specific phosphorylation sites (p-sites). Two novel methods, position weight determination (PWD) and scoring matrix optimiza-tion (SMO), were developed. Compared with other existing tools, GPS 5.0 exhibits a highly com-petitive accuracy. Besides serine/threonine or tyrosine kinases, GPS 5.0 also supports the prediction of dual-specificity kinase-specific p-sites. In the classical module of GPS 5.0, 617 individual predic-tors were constructed for predicting p-sites of 479 human PKs. To extend the application of GPS 5.0, a species-specific module was implemented to predict kinase-specific p-sites for 44,795 PKs in 161 eukaryotes. The online service and local packages of GPS 5.0 are freely available for academic research at http://gps.biocuckoo.cn.
2.HybridSucc:A Hybrid-learning Architecture for General and Species-specific Succinylation Site Prediction
Ning WANSHAN ; Xu HAODONG ; Jiang PEIRAN ; Cheng HAN ; Deng WANKUN ; Guo YAPING ; Xue YU
Genomics, Proteomics & Bioinformatics 2020;18(2):194-207
As an important protein acylation modification, lysine succinylation (Ksucc) is involved in diverse biological processes, and participates in human tumorigenesis. Here, we collected 26,243 non-redundant known Ksucc sites from 13 species as the benchmark data set, combined 10 types of informative features, and implemented a hybrid-learning architecture by integrating deep-learning and conventional machine-learning algorithms into a single framework. We constructed a new tool named HybridSucc, which achieved area under curve (AUC) values of 0.885 and 0.952 for general and human-specific prediction of Ksucc sites, respectively. In comparison, the accuracy of Hybrid-Succ was 17.84%–50.62%better than that of other existing tools. Using HybridSucc, we conducted a proteome-wide prediction and prioritized 370 cancer mutations that change Ksucc states of 218 important proteins, including PKM2, SHMT2, and IDH2. We not only developed a high-profile tool for predicting Ksucc sites, but also generated useful candidates for further experimental con-sideration. The online service of HybridSucc can be freely accessed for academic research at http://hybridsucc.biocuckoo.org/.
3.PTMD: A Database of Human Disease-associated Post-translational Modifications.
Haodong XU ; Yongbo WANG ; Shaofeng LIN ; Wankun DENG ; Di PENG ; Qinghua CUI ; Yu XUE
Genomics, Proteomics & Bioinformatics 2018;16(4):244-251
Various posttranslational modifications (PTMs) participate in nearly all aspects of biological processes by regulating protein functions, and aberrant states of PTMs are frequently implicated in human diseases. Therefore, an integral resource of PTM-disease associations (PDAs) would be a great help for both academic research and clinical use. In this work, we reported PTMD, a well-curated database containing PTMs that are associated with human diseases. We manually collected 1950 known PDAs in 749 proteins for 23 types of PTMs and 275 types of diseases from the literature. Database analyses show that phosphorylation has the largest number of disease associations, whereas neurologic diseases have the largest number of PTM associations. We classified all known PDAs into six classes according to the PTM status in diseases and demonstrated that the upregulation and presence of PTM events account for a predominant proportion of disease-associated PTM events. By reconstructing a disease-gene network, we observed that breast cancers have the largest number of associated PTMs and AKT1 has the largest number of PTMs connected to diseases. Finally, the PTMD database was developed with detailed annotations and can be a useful resource for further analyzing the relations between PTMs and human diseases. PTMD is freely accessible at http://ptmd.biocuckoo.org.
Databases, Protein
;
Disease
;
genetics
;
Gene Regulatory Networks
;
Humans
;
Phosphorylation
;
Protein Processing, Post-Translational
;
Proteins
;
metabolism
;
Search Engine