Compositional Variability and MutationSpectra of Monophyletic SARS-CoV-2 Clades
- Author:
Teng XUFEI
1
,
2
,
3
;
Li QIANPENG
;
Li ZHAO
;
Zhang YUANSHENG
;
Niu GUANGYI
;
Xiao JINGFA
;
Yu JUN
;
Zhang ZHANG
;
Song SHUHUI
Author Information
1. China National Center for Bioinformation,Beijing 100101,China
2. National Genomics Data Center&CAS Key Laboratory of Genome Sciences and Information,Beijing Institute of Genomics,Chinese Academy of Sciences,Beijing 100101,China
3. University of Chinese Academy of Sciences,Beijing 100049,China
- Keywords:
SARS-CoV-2;
Nucleotide composition;
Mutation spectrum;
Viral replication
- From:
Genomics, Proteomics & Bioinformatics
2020;18(6):648-663
- CountryChina
- Language:Chinese
-
Abstract:
COVID-19 and its causative pathogen SARS-CoV-2 have rushed the world into a stag-gering pandemic in a few months, and a global fight against both has been intensifying. Here, we describe an analysis procedure where genome composition and its variables are related, through the genetic code to molecular mechanisms, based on understanding of RNA replication and its feed-back loop from mutation to viral proteome sequence fraternity including effective sites on the replicase-transcriptase complex. Our analysis starts with primary sequence information, identity-based phylogeny based on 22,051 SARS-CoV-2 sequences, and evaluation of sequence variation patterns as mutation spectra and its 12 permutations among organized clades. All are tailored to two key mechanisms: strand-biased and function-associated mutations. Our findings are listed as follows: 1) The most dominant mutation is C-to-U permutation, whose abundant second-codon-position counts alter amino acid composition toward higher molecular weight and lower hydropho-bicity, albeit assumed most slightly deleterious. 2) The second abundance group includes three negative-strand mutations (U-to-C, A-to-G, and G-to-A) and a positive-strand mutation (G-to-U) due to DNA repair mechanisms after cellular abasic events. 3) A clade-associated biased muta-tion trend is found attributable to elevated level of negative-sense strand synthesis. 4) Within-clade permutation variation is very informative for associating non-synonymous mutations and viral pro-teome changes. These findings demand a platform where emerging mutations are mapped onto mostly subtle but fast-adjusting viral proteomes and transcriptomes, to provide biological and clinical information after logical convergence for effective pharmaceutical and diagnostic applica-tions. Such actions are in desperate need, especially in the middle of the War against COVID-19.