DeeReCT-APA:Prediction of Alternative Polyadenylation Site Usage Through Deep Learning
- Author:
Li ZHONGXIAO
1
;
Li YISHENG
;
Zhang BIN
;
Li YU
;
Long YONGKANG
;
Zhou JUEXIAO
;
Zou XUDONG
;
Zhang MIN
;
Hu YUHUI
;
Chen WEI
;
Gao XIN
Author Information
1. King Abdullah University of Science and Technology(KAUST),Computational Bioscience Research Center(CBRC),Computer,Electrical and Mathematical Sciences and Engineering(CEMSE)Division,Thuwal 23955-6900,Saudi Arabia
- Keywords:
Polyadenylation;
Gene regulation;
Sequence analysis;
Deep learning;
Bioinformatics
- From:
Genomics, Proteomics & Bioinformatics
2022;20(3):483-495
- CountryChina
- Language:Chinese
-
Abstract:
Alternative polyadenylation(APA)is a crucial step in post-transcriptional regulation.Previous bioinformatic studies have mainly focused on the recognition of polyadenylation sites(PASs)in a given genomic sequence,which is a binary classification problem.Recently,computa-tional methods for predicting the usage level of alternative PASs in the same gene have been pro-posed.However,all of them cast the problem as a non-quantitative pairwise comparison task and do not take the competition among multiple PASs into account.To address this,here we propose a deep learning architecture,Deep Regulatory Code and Tools for Alternative Polyadenylation(DeeReCT-APA),to quantitatively predict the usage of all alternative PASs of a given gene.To accommodate different genes with potentially different numbers of PASs,DeeReCT-APA treats the problem as a regression task with a variable-length target.Based on a convolutional neural network-long short-term memory(CNN-LSTM)architecture,DeeReCT-APA extracts sequence features with CNN layers,uses bidirectional LSTM to explicitly model the interactions among com-peting PASs,and outputs percentage scores representing the usage levels of all PASs of a gene.In addition to the fact that only our method can quantitatively predict the usage of all the PASs within a gene,we show that our method consistently outperforms other existing methods on three different tasks for which they are trained:pairwise comparison task,highest usage prediction task,and rank-ing task.Finally,we demonstrate that our method can be used to predict the effect of genetic variations on APA patterns and sheds light on future mechanistic understanding in APA regulation.