Performance comparison of 5 automatic cell type annotation methods in scRNA-seq data

Jinghui NI; Yu GAO; Qiyue CHEN; Ying ZHANG; Yan LIU

Return

Performance comparison of 5 automatic cell type annotation methods in scRNA-seq data

VernacularTitle:scRNA-seq数据中5种自动细胞类型注释方法的性能比较
Author: Jinghui NI ¹ ; Yu GAO ¹ ; Qiyue CHEN ¹ ; Ying ZHANG ¹ ; Yan LIU ¹
Author Information

1. 哈尔滨医科大学公共卫生学院卫生统计学教研室，哈尔滨　150081
Publication Type:Journal Article
Keywords: Cell annotation; Single-cell sequencing data; Deep learning; Attention mechanism; Non-small cell lung cancer
From: Chinese Journal of Endemiology 2025;44(11):931-936
CountryChina
Language:Chinese
Abstract: Objective:This study aims to analyze the performance of five automatic cell type annotation methods in single cell RNA sequencing (scRNA-seq) data.Methods:Simulated data were generated using the Splatter package in R language, taking into account two data characteristics: the number of cells and the number of genes. The actual data came from the GSE10245 scRNA seq dataset of non-small cell lung cancer in Gene Expression Omnibus (GEO) database, the data had been pre-processed and batch effects had been eliminated. The automatic cell type recognition (ACTINN) of neural networks, the single-cell type annotation method based on deep learning (scDeepSort), the reference batch transcriptome annotation scRNA seq R-package (SingleR), the cross platform and cross species scRNA seq data classifier (SingleCellNet), and the cross scRNA seq dataset projection (scMap-cell) were implemented using the Tensorflow library in Python. The performance evaluation indicators for cell type annotation included accuracy (ACC), F1-score, and Matthews correlation coefficient (MCC). Each method was validated using ten fold cross validation, and the average value was taken after 50 repeated runs for performance comparison between methods. The Dunnett's t-test in the DescTools package of R language was used for multiple comparisons between ACTINN and other four methods. Results:Under 12 different scenarios (3 levels of cell numbers × 4 levels of gene numbers), simulated data analysis showed that compared with scDeepSort, SingleR, SingleCellNet, and scMap-cell, the percentage increase in ACC value of ACTINN ranged from 3.31% to 14.59%, 1.38% to 13.03%, 12.98% to 25.25%, and 20.72% to 29.62%, respectively; the range of F1 score improvement percentages were 2.75% - 22.74%, 2.46% - 23.68%, 5.07% - 27.47%, and 10.27% - 31.47%, respectively; the percentage increase ranges for MCC values were 3.42% - 9.75%, 2.26% - 7.61%, 5.41% - 11.11%, and 8.27% - 15.22%, respectively. Actual data analysis showed that the ACC value of ACTINN was 81.0%, which was increased by 2.1%, 5.2%, 7.9%, and 8.9% compared with the above four methods, respectively; the F1-score value was 80.5%, which was increased by 2.3%, 5.9%, 2.4%, and 6.0%, respectively; the MCC value was 83.3%, which was increased by 0.9%, 2.5%, 3.4%, and 11.2%, respectively. The results of Dunnett's t-test showed that the difference was not statistically significant in ACC values between scDeepSort and ACTINN ( P = 0.821), in F1-score values between scDeepSort and ACTINN ( P = 0.498), and in MCC values between scDeepSort, SingleCellNet and ACTINN ( P = 0.904, 0.134). However, the differences were statistically significant in other multiple comparisons ( P < 0.05). Conclusions:ACTINN and scDeepSort have good performance in cell type annotation, with ACTINN showing outstanding performance and SingleR showing robust performance, while SingleCellNet and scMap-cell have relatively limited performance. This suggests that self-attention mechanism algorithm based on Transformer framework is expected to promote further development of automatic cell annotation methods.