1.Compositional Variability and MutationSpectra of Monophyletic SARS-CoV-2 Clades
Teng XUFEI ; Li QIANPENG ; Li ZHAO ; Zhang YUANSHENG ; Niu GUANGYI ; Xiao JINGFA ; Yu JUN ; Zhang ZHANG ; Song SHUHUI
Genomics, Proteomics & Bioinformatics 2020;18(6):648-663
COVID-19 and its causative pathogen SARS-CoV-2 have rushed the world into a stag-gering pandemic in a few months, and a global fight against both has been intensifying. Here, we describe an analysis procedure where genome composition and its variables are related, through the genetic code to molecular mechanisms, based on understanding of RNA replication and its feed-back loop from mutation to viral proteome sequence fraternity including effective sites on the replicase-transcriptase complex. Our analysis starts with primary sequence information, identity-based phylogeny based on 22,051 SARS-CoV-2 sequences, and evaluation of sequence variation patterns as mutation spectra and its 12 permutations among organized clades. All are tailored to two key mechanisms: strand-biased and function-associated mutations. Our findings are listed as follows: 1) The most dominant mutation is C-to-U permutation, whose abundant second-codon-position counts alter amino acid composition toward higher molecular weight and lower hydropho-bicity, albeit assumed most slightly deleterious. 2) The second abundance group includes three negative-strand mutations (U-to-C, A-to-G, and G-to-A) and a positive-strand mutation (G-to-U) due to DNA repair mechanisms after cellular abasic events. 3) A clade-associated biased muta-tion trend is found attributable to elevated level of negative-sense strand synthesis. 4) Within-clade permutation variation is very informative for associating non-synonymous mutations and viral pro-teome changes. These findings demand a platform where emerging mutations are mapped onto mostly subtle but fast-adjusting viral proteomes and transcriptomes, to provide biological and clinical information after logical convergence for effective pharmaceutical and diagnostic applica-tions. Such actions are in desperate need, especially in the middle of the War against COVID-19.
2.The Global Landscape of SARS-CoV-2 Genomes, Variants, and Haplotypes in 2019nCoVR
Song SHUHUI ; Ma LINA ; Zou DONG ; Tian DONGMEI ; Li CUIPING ; Zhu JUNWEI ; Chen MEILI ; Wang ANKE ; Ma YINGKE ; Li MENGWEI ; Teng XUFEI ; Cui YING ; Duan GUANGYA ; Zhang MOCHEN ; Jin TONG ; Shi CHENGMIN ; Du ZHENGLIN ; Zhang YADONG ; Liu CHUANDONG ; Li RUJIAO ; Zeng JINGYAO ; Hao LILI ; Jiang SHUAI ; Chen HUA ; Han DALI ; Xiao JINGFA ; Zhang ZHANG ; Zhao WENMING ; Xue YONGBIAO ; Bao YIMING
Genomics, Proteomics & Bioinformatics 2020;18(6):749-759
On January 22, 2020, China National Center for Bioinformation (CNCB) released the 2019 Novel Coronavirus Resource (2019nCoVR), an open-access information resource for the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). 2019nCoVR features a comprehensive integra-tion of sequence and clinical information for all publicly available SARS-CoV-2 isolates, which are manually curated with value-added annotations and quality evaluated by an automated in-house pipeline. Of particular note, 2019nCoVR offers systematic analyses to generate a dynamic landscape of SARS-CoV-2 genomic variations at a global scale. It provides all identified variants and their detailed statistics for each virus isolate, and congregates the quality score, functional annotation,and population frequency for each variant. Spatiotemporal change for each variant can be visualized and historical viral haplotype network maps for the course of the outbreak are also generated based on all complete and high-quality genomes available. Moreover, 2019nCoVR provides a full collection of SARS-CoV-2 relevant literature on the coronavirus disease 2019 (COVID-19), including published papers from PubMed as well as preprints from services such as bioRxiv and medRxiv through Europe PMC. Furthermore, by linking with relevant databases in CNCB, 2019nCoVR offers data submission services for raw sequence reads and assembled genomes, and data sharing with NCBI. Collectively, SARS-CoV-2 is updated daily to collect the latest information on genome sequences, variants, hap-lotypes, and literature for a timely reflection, making 2019nCoVR a valuable resource for the global research community. 2019nCoVR is accessible at https://bigd.big.ac.cn/ncov/.