Machine Learning Modeling of Protein-intrinsic Features Predicts Tractability of Targeted Protein Degradation
- Author:
Zhang WUBING
1
,
2
;
Burman S.Roy SHOURYA
;
Chen JIAYE
;
A.Donovan KATHERINE
;
Cao YANG
;
Shu CHELSEA
;
Zhang BONING
;
Zeng ZEXIAN
;
Gu SHENGQING
;
Zhang YI
;
Li DIAN
;
S.Fischer ERIC
;
Tokheim COLLIN
;
Liu X.SHIRLEY
Author Information
1. Department of Data Science,Dana-Farber Cancer Institute,Boston,MA 02215,USA
2. Department of Biostatistics,Harvard T.H.Chan School of Public Health,Boston,MA 02115,USA
- Keywords:
Targeted protein degradation;
Degradability;
Protein-intrinsic feature;
Ubiquitination;
Machine learning
- From:
Genomics, Proteomics & Bioinformatics
2022;20(5):882-898
- CountryChina
- Language:Chinese
-
Abstract:
Targeted protein degradation(TPD)has rapidly emerged as a therapeutic modality to eliminate previously undruggable proteins by repurposing the cell's endogenous protein degrada-tion machinery.However,the susceptibility of proteins for targeting by TPD approaches,termed"degradability",is largely unknown.Here,we developed a machine learning model,model-free anal-ysis of protein degradability(MAPD),to predict degradability from features intrinsic to protein tar-gets.MAPD shows accurate performance in predicting kinases that are degradable by TPD compounds[with an area under the precision-recall curve(AUPRC)of 0.759 and an area under the receiver operating characteristic curve(AUROC)of 0.775]and is likely generalizable to inde-pendent non-kinase proteins.We found five features with statistical significance to achieve optimal prediction,with ubiquitination potential being the most predictive.By structural modeling,we found that E2-accessible ubiquitination sites,but not lysine residues in general,are particularly associated with kinase degradability.Finally,we extended MAPD predictions to the entire proteome to find 964 disease-causing proteins(including proteins encoded by 278 cancer genes)that may be tractable to TPD drug development.