1.CoBRA: Containerized Bioinformatics Workflow for Reproducible ChIP/ATAC-seq Analysis
Qiu XINTAO ; S.Feit AVERY ; Feiglin ARIEL ; Xie YINGTIAN ; Kesten NIKOLAS ; Taing LEN ; Perkins JOSEPH ; Gu SHENGQING ; Li YIHAO ; Cejas PALOMA ; Zhou NINGXUAN ; Jeselsohn RINATH ; Brown MYLES ; Liu X.SHIRLEY ; W.Long HENRY
Genomics, Proteomics & Bioinformatics 2021;19(4):652-661
Chromatin immunoprecipitation sequencing (ChIP-seq) and the Assay for Transposase-Accessible Chromatin with high-throughput sequencing (ATAC-seq) have become essential technologies to effectively measure protein–DNA interactions and chromatin accessibility. However, there is a need for a scalable and reproducible pipeline that incorporates proper normalization between samples, correction of copy number variations, and integration of new downstream analysis tools. Here we present Containerized Bioinformatics workflow for Reproducible ChIP/ATAC-seq Analysis (CoBRA), a modularized computational workflow which quantifies ChIP-seq and ATAC-seq peak regions and performs unsupervised and supervised analyses. CoBRA provides a comprehensive state-of-the-art ChIP-seq and ATAC-seq analysis pipeline that can be used by scientists with limited computational experience. This enables researchers to gain rapid insight into protein–DNA interactions and chromatin accessibility through sample clustering, differential peak calling, motif enrichment, comparison of sites to a reference database, and pathway analysis. CoBRA is publicly available online at https://bitbucket. org/cfce/cobra.
2.Machine Learning Modeling of Protein-intrinsic Features Predicts Tractability of Targeted Protein Degradation
Zhang WUBING ; Burman S.Roy SHOURYA ; Chen JIAYE ; A.Donovan KATHERINE ; Cao YANG ; Shu CHELSEA ; Zhang BONING ; Zeng ZEXIAN ; Gu SHENGQING ; Zhang YI ; Li DIAN ; S.Fischer ERIC ; Tokheim COLLIN ; Liu X.SHIRLEY
Genomics, Proteomics & Bioinformatics 2022;20(5):882-898
Targeted protein degradation(TPD)has rapidly emerged as a therapeutic modality to eliminate previously undruggable proteins by repurposing the cell's endogenous protein degrada-tion machinery.However,the susceptibility of proteins for targeting by TPD approaches,termed"degradability",is largely unknown.Here,we developed a machine learning model,model-free anal-ysis of protein degradability(MAPD),to predict degradability from features intrinsic to protein tar-gets.MAPD shows accurate performance in predicting kinases that are degradable by TPD compounds[with an area under the precision-recall curve(AUPRC)of 0.759 and an area under the receiver operating characteristic curve(AUROC)of 0.775]and is likely generalizable to inde-pendent non-kinase proteins.We found five features with statistical significance to achieve optimal prediction,with ubiquitination potential being the most predictive.By structural modeling,we found that E2-accessible ubiquitination sites,but not lysine residues in general,are particularly associated with kinase degradability.Finally,we extended MAPD predictions to the entire proteome to find 964 disease-causing proteins(including proteins encoded by 278 cancer genes)that may be tractable to TPD drug development.