scEMAIL:Universal and Source-free Annotation Method for scRNA-seq Data with Novel Cell-type Perception
- Author:
Wan HUI
1
;
Chen LIANG
;
Deng MINGHUA
Author Information
1. School of Mathematical Sciences,Peking University,Beijing 100871,China
- Keywords:
Cell-type annotation;
Transfer learning;
Privacy preservation;
Single-cell RNA sequencing;
Gene expression
- From:
Genomics, Proteomics & Bioinformatics
2022;20(5):939-958
- CountryChina
- Language:Chinese
-
Abstract:
Current cell-type annotation tools for single-cell RNA sequencing(scRNA-seq)data mainly utilize well-annotated source data to help identify cell types in target data.However,on account of privacy preservation,their requirements for raw source data may not always be satisfied.In this case,achieving feature alignment between source and target data explicitly is impossible.Additionally,these methods are barely able to discover the presence of novel cell types.A subjective threshold is often selected by users to detect novel cells.We propose a universal annotation frame-work for scRNA-seq data called scEMAIL,which automatically detects novel cell types without accessing source data during adaptation.For new cell-type identification,a novel cell-type percep-tion module is designed with three steps.First,an expert ensemble system measures uncertainty of each cell from three complementary aspects.Second,based on this measurement,bimodality tests are applied to detect the presence of new cell types.Third,once assured of their presence,an adap-tive threshold via manifold mixup partitions target cells into"known"and"unknown"groups.Model adaptation is then conducted to alleviate the batch effect.We gather multi-order neighbor-hood messages globally and impose local affinity regularizations on"known"cells.These con-straints mitigate wrong classifications of the source model via reliable self-supervised information of neighbors.scEMAIL is accurate and robust under various scenarios in both simulation and real data.It is also flexible to be applied to challenging single-cell ATAC-seq data without loss of supe-riority.The source code of scEMAIL can be accessed at https://github.com/aster-ww/scEMAIL and https://ngdc.cncb.ac.cn/biocode/tools/BT007335/releases/vl.0.