GTB-PPI:Predict Protein–protein Interactions Based on L1-regularized Logistic Regression and Gradient Tree Boosting
- Author:
Yu BIN
1
;
Chen CHENG
;
Zhou HONGYAN
;
Liu BINGQIANG
;
Ma QIN
Author Information
1. School of Life Sciences,University of Science and Technology of China,Hefei 230027,China;College of Mathematics and Physics,Qingdao University of Science and Technology,Qingdao 266061,China;Artificial Intelligence and Biomedical Big Data Research Center,Qingdao University of Science and Technology,Qingdao 266061,China
- Keywords:
Protein-protein interaction;
Feature fusion;
L1-regularized logistic regression;
Gradient tree boosting;
Machine learning
- From:
Genomics, Proteomics & Bioinformatics
2020;18(5):582-592
- CountryChina
- Language:Chinese
-
Abstract:
Protein–protein interactions (PPIs) are of great importance to understand genetic mech-anisms, delineate disease pathogenesis, and guide drug design. With the increase of PPI data and development of machine learning technologies, prediction and identification of PPIs have become a research hotspot in proteomics. In this study, we propose a new prediction pipeline for PPIs based on gradient tree boosting (GTB). First, the initial feature vector is extracted by fusing pseudo amino acid composition (PseAAC), pseudo position-specific scoring matrix (PsePSSM), reduced sequence and index-vectors (RSIV), and autocorrelation descriptor (AD). Second, to remove redundancy and noise, we employ L1-regularized logistic regression (L1-RLR) to select an optimal feature subset. Finally, GTB-PPI model is constructed. Five-fold cross-validation showed that GTB-PPI achieved the accuracies of 95.15%and 90.47%on Saccharomyces cerevisiae and Helicobacter pylori datasets, respectively. In addition, GTB-PPI could be applied to predict the independent test datasets for Caenorhabditis elegans, Escherichia coli, Homo sapiens, and Mus musculus, the one-core PPI net-work for CD9, and the crossover PPI network for the Wnt-related signaling pathways. The results show that GTB-PPI can significantly improve accuracy of PPI prediction. The code and datasets of GTB-PPI can be downloaded from https://github.com/QUST-AIBBDRC/GTB-PPI/.