SSCC: A Novel Computational Framework for Rapid and Accurate Clustering Large-scale Single Cell RNA-seq Data.
10.1016/j.gpb.2018.10.003
- Author:
Xianwen REN
1
;
Liangtao ZHENG
2
;
Zemin ZHANG
3
Author Information
1. BIOPIC, Beijing Advanced Innovation Center for Genomics, and School of Life Sciences, Peking University, Beijing 100871, China. Electronic address: renxwise@pku.edu.cn.
2. BIOPIC, Beijing Advanced Innovation Center for Genomics, and School of Life Sciences, Peking University, Beijing 100871, China.
3. BIOPIC, Beijing Advanced Innovation Center for Genomics, and School of Life Sciences, Peking University, Beijing 100871, China. Electronic address: zemin@pku.edu.cn.
- Publication Type:Journal Article
- Keywords:
Classification;
Clustering;
RNA-seq;
Single cell;
Subsampling
- MeSH:
Algorithms;
Animals;
Cluster Analysis;
Computational Biology;
methods;
Databases as Topic;
Gene Expression Profiling;
methods;
Humans;
Mice;
Sequence Analysis, RNA;
Single-Cell Analysis;
Software;
Statistics, Nonparametric
- From:
Genomics, Proteomics & Bioinformatics
2019;17(2):201-210
- CountryChina
- Language:English
-
Abstract:
Clustering is a prevalent analytical means to analyze single cell RNA sequencing (scRNA-seq) data but the rapidly expanding data volume can make this process computationally challenging. New methods for both accurate and efficient clustering are of pressing need. Here we proposed Spearman subsampling-clustering-classification (SSCC), a new clustering framework based on random projection and feature construction, for large-scale scRNA-seq data. SSCC greatly improves clustering accuracy, robustness, and computational efficacy for various state-of-the-art algorithms benchmarked on multiple real datasets. On a dataset with 68,578 human blood cells, SSCC achieved 20% improvement for clustering accuracy and 50-fold acceleration, but only consumed 66% memory usage, compared to the widelyused software package SC3. Compared to k-means, the accuracy improvement of SSCC can reach 3-fold. An R implementation of SSCC is available at https://github.com/Japrin/sscClust.