1.Evaluation of Cell Type Annotation R Packages on Single-cell RNA-seq Data
Huang QIANHUI ; Liu YU ; Du YUHENG ; X.Garmire LANA
Genomics, Proteomics & Bioinformatics 2021;19(2):267-281
Annotating cell types is a critical step in single-cell RNA sequencing (scRNA-seq) data analysis.Some supervised or semi-supervised classification methods have recently emerged to enable automated cell type identification.However,comprehensive evaluations of these methods are lacking.Moreover,it is not clear whether some classification methods originally designed for ana-lyzing other bulk omics data are adaptable to scRNA-seq analysis.In this study,we evaluated ten cell type annotation methods publicly available as R packages.Eight of them are popular methods developed specifically for single-cell research,including Seurat,scmap,SingleR,CHETAH,Sin-gleCellNet,scID,Garnett,and SCINA.The other two methods were repurposed from deconvolut-ing DNA methylation data,i.e.,linear constrained projection (CP) and robust partial correlations(RPC).We conducted systematic comparisons on a wide variety of public scRNA-seq datasets as well as simulation data.We assessed the accuracy through intra-dataset and inter-dataset predic-tions;the robustness over practical challenges such as gene filtering,high similarity among cell types,and increased cell type classes;as well as the detection of rare and unknown cell types.Over-all,methods such as Seurat,SingleR,CP,RPC,and SingleCellNet performed well,with Seurat being the best at annotating major cell types.Additionally,Seurat,SingleR,CP,and RPC were more robust against downsampling.However,Seurat did have a major drawback at predicting rare cell populations,and it was suboptimal at differentiating cell types highly similar to each other,compared to SingleR and RPC.All the code and data are available from https://github.com/qian-huiSenn/scRNA_ cell_ deconv_benchmark.
2.Computational Methods for Single-cell Multi-omics Integration and Alignment
Stanojevic STEFAN ; Li YIJUN ; Ristivojevic ALEKSANDAR ; X.Garmire LANA
Genomics, Proteomics & Bioinformatics 2022;20(5):836-849
Recently developed technologies to generate single-cell genomic data have made a revo-lutionary impact in the field of biology.Multi-omics assays offer even greater opportunities to understand cellular states and biological processes.The problem of integrating different omics data with very different dimensionality and statistical properties remains,however,quite challenging.A growing body of computational tools is being developed for this task,leveraging ideas ranging from machine translation to the theory of networks,and represents another frontier on the interface of biology and data science.Our goal in this review is to provide a comprehensive,up-to-date survey of computational techniques for the integration of single-cell multi-omics data,while making the concepts behind each algorithm approachable to a non-expert audience.