1.CloudLCA: finding the lowest common ancestor in metagenome analysis using cloud computing.
Guoguang ZHAO ; Dechao BU ; Changning LIU ; Jing LI ; Jian YANG ; Zhiyong LIU ; Yi ZHAO ; Runsheng CHEN
Protein & Cell 2012;3(2):148-152
Estimating taxonomic content constitutes a key problem in metagenomic sequencing data analysis. However, extracting such content from high-throughput data of next-generation sequencing is very time-consuming with the currently available software. Here, we present CloudLCA, a parallel LCA algorithm that significantly improves the efficiency of determining taxonomic composition in metagenomic data analysis. Results show that CloudLCA (1) has a running time nearly linear with the increase of dataset magnitude, (2) displays linear speedup as the number of processors grows, especially for large datasets, and (3) reaches a speed of nearly 215 million reads each minute on a cluster with ten thin nodes. In comparison with MEGAN, a well-known metagenome analyzer, the speed of CloudLCA is up to 5 more times faster, and its peak memory usage is approximately 18.5% that of MEGAN, running on a fat node. CloudLCA can be run on one multiprocessor node or a cluster. It is expected to be part of MEGAN to accelerate analyzing reads, with the same output generated as MEGAN, which can be import into MEGAN in a direct way to finish the following analysis. Moreover, CloudLCA is a universal solution for finding the lowest common ancestor, and it can be applied in other fields requiring an LCA algorithm.
Algorithms
;
Databases, Genetic
;
Metagenomics
;
Search Engine
;
User-Computer Interface
2.Single-cell Long Non-coding RNA Landscape of T Cells in Human Cancer Immunity
Luo HAITAO ; Bu DECHAO ; Shao LIJUAN ; Li YANG ; Sun LIANG ; Wang CE ; Wang JING ; Yang WEI ; Yang XIAOFEI ; Dong JUN ; Zhao YI ; Li FURONG
Genomics, Proteomics & Bioinformatics 2021;19(3):377-393
The development of new biomarkers or therapeutic targets for cancer immunotherapies requires deep under-standing of T cells. To date, the complete landscape and systematic characterization of long noncoding RNAs (lncRNAs) in T cells in cancer immunity are lacking. Here, by systematically analyzing full-length single-cell RNA sequencing (scRNA-seq) data of more than 20,000 libraries of T cells across three cancer types, we provided the first comprehensive catalog and the functional repertoires of lncRNAs in human T cells. Specifically, we developed a custom pipeline for de novo transcriptome assembly and obtained a novel lncRNA catalog containing 9433 genes. This increased the number of current human lncRNA catalog by 16%and nearly doubled the number of lncRNAs expressed in T cells. We found that a portion of expressed genes in single T cells were lncRNAs which had been overlooked by the majority of previous studies. Based on metacell maps constructed by the MetaCell algorithm that partitions scRNA-seq datasets into disjointed and homogenous groups of cells (metacells), 154 signature lncRNA genes were identified. They were associated with effector, exhausted, and regulatory T cell states. Moreover, 84 of them were functionally annotated based on the co-expression networks, indicating that lncRNAs might broadly participate in the regulation of T cell functions. Our findings provide a new point of view and resource for investigating the mechanisms of T cell regulation in cancer immunity as well as for novel cancer-immune biomarker development and cancer immunotherapies.