DeepCAGE:Incorporating Transcription Factors in Genome-wide Prediction of Chromatin Accessibility
- Author:
Liu QIAO
1
,
2
;
Hua KUI
;
Zhang XUEGONG
;
Wong Hung WING
;
Jiang RUI
Author Information
1. Ministry of Education Key Laboratory of Bioinformatics
2. Bioinformatics Division
- Keywords:
Chromatin accessibility;
Deep learning;
Transcription factor;
Gene expression
- From:
Genomics, Proteomics & Bioinformatics
2022;20(3):496-507
- CountryChina
- Language:Chinese
-
Abstract:
Although computational approaches have been complementing high-throughput biolog-ical experiments for the identification of functional regions in the human genome,it remains a great challenge to systematically decipher interactions between transcription factors(TFs)and regulatory elements to achieve interpretable annotations of chromatin accessibility across diverse cellular con-texts.To solve this problem,we propose DeepCAGE,a deep learning framework that integrates sequence information and binding statuses of TFs,for the accurate prediction of chromatin acces-sible regions at a genome-wide scale in a variety of cell types.DeepCAGE takes advantage of a den-sely connected deep convolutional neural network architecture to automatically learn sequence signatures of known chromatin accessible regions and then incorporates such features with expres-sion levels and binding activities of human core TFs to predict novel chromatin accessible regions.In a series of systematic comparisons with existing methods,DeepCAGE exhibits superior perfor-mance in not only the classification but also the regression of chromatin accessibility signals.In a detailed analysis of TF activities,DeepCAGE successfully extracts novel binding motifs and mea-sures the contribution of a TF to the regulation with respect to a specific locus in a certain cell type.When applied to whole-genome sequencing data analysis,our method successfully prioritizes puta-tive deleterious variants underlying a human complex trait and thus provides insights into the understanding of disease-associated genetic variants.