- VernacularTitle:tRF Prospect:基于神经网络学习的tRNA衍生片段靶标预测算法
- Author:
Dai-Xi REN
1
;
Jian-Yong YI
2
;
Yong-Zhen MO
1
;
Mei YANG
1
;
Wei XIONG
1
;
Zhao-Yang ZENG
1
;
Lei SHI
3
Author Information
- Publication Type:Journal Article
- Keywords: tRF; database; neural network learnin
- From: Progress in Biochemistry and Biophysics 2025;52(9):2428-2438
- CountryChina
- Language:Chinese
- Abstract: ObjectiveTransfer RNA-derived fragments (tRFs) are a recently characterized and rapidly expanding class of small non-coding RNAs, typically ranging from 13 to 50 nucleotides in length. They are derived from mature or precursor tRNA molecules through specific cleavage events and have been implicated in a wide range of cellular processes. Increasing evidence indicates that tRFs play important regulatory roles in gene expression, primarily by interacting with target messenger RNAs (mRNAs) to induce transcript degradation, in a manner partially analogous to microRNAs (miRNAs). However, despite their emerging biological relevance and potential roles in disease mechanisms, there remains a significant lack of computational tools capable of systematically predicting the interaction landscape between tRFs and their target mRNAs. Existing databases often rely on limited interaction features and lack the flexibility to accommodate novel or user-defined tRF sequences. The primary goal of this study was to develop a machine learning based prediction algorithm that enables high-throughput, accurate identification of tRF:mRNA binding events, thereby facilitating the functional analysis of tRF regulatory networks. MethodsWe began by assembling a manually curated dataset of 38 687 experimentally verified tRF:mRNA interaction pairs and extracting seven biologically informed features for each pair: (1) AU content of the binding site, (2) site pairing status, (3) binding region location, (4) number of binding sites per mRNA, (5) length of the longest consecutive complementary stretch, (6) total binding region length, and (7) seed sequence complementarity. Using this dataset and feature set, we trained 4 distinct machine learning classifiers—logistic regression, random forest, decision tree, and a multilayer perceptron (MLP)—to compare their ability to discriminate true interactions from non-interactions. Each model’s performance was evaluated using overall accuracy, receiver operating characteristic (ROC) curves, and the corresponding area under the ROC curve (AUC). The MLP consistently achieved the highest AUC among the four, and was therefore selected as the backbone of our prediction framework, which we named tRF Prospect. For biological validation, we retrieved 3 high-throughput RNA-seq datasets from the gene expression omnibus (GEO) in which individual tRFs were overexpressed: AS-tDR-007333 (GSE184690), tRF-3004b (GSE197091), and tRF-20-S998LO9D (GSE208381). Differential expression analysis of each dataset identified genes downregulated upon tRF overexpression, which we designated as putative targets. We then compared the predictions generated by tRF Prospect against those from three established tools—tRFTar, tRForest, and tRFTarget—by quantifying the number of predicted targets for each tRF and assessing concordance with the experimentally derived gene sets. ResultsThe proposed algorithm achieved high predictive accuracy, with an AUC of 0.934. Functional validation was conducted using transcriptome-wide RNA-seq datasets from cells overexpressing specific tRFs, confirming the model’s ability to accurately predict biologically relevant downregulation of mRNA targets. When benchmarked against established tools such as tRFTar, tRForest, and tRFTarget, tRF Prospect consistently demonstrated superior performance, both in terms of predictive precision and sensitivity, as well as in identifying a higher number of true-positive interactions. Moreover, unlike static databases that are limited to precomputed results, tRF Prospect supports real-time prediction for any user-defined tRF sequence, enhancing its applicability in exploratory and hypothesis-driven research. ConclusionThis study introduces tRF Prospect as a powerful and flexible computational tool for investigating tRF:mRNA interactions. By leveraging the predictive strength of deep learning and incorporating a broad spectrum of interaction-relevant features, it addresses key limitations of existing platforms. Specifically, tRF Prospect: (1) expands the range of detectable tRF and target types; (2) improves prediction accuracy through multilayer perceptron model; and (3) allows for dynamic, user-driven analysis beyond database constraints. Although the current version emphasizes miRNA-like repression mechanisms and faces challenges in accurately capturing 5'UTR-associated binding events, it nonetheless provides a critical foundation for future studies aiming to unravel the complex roles of tRFs in gene regulation, cellular function, and disease pathogenesis.