1.Ongoing Positive Selection Drives the Evolution of SARS-CoV-2 Genomes
Hou YALI ; Zhao SHILEI ; Liu QI ; Zhang XIAOLONG ; Sha TONG ; Su YANKAI ; Zhao WENMING ; Bao YIMING ; Xue YONGBIAO ; Chen HUA
Genomics, Proteomics & Bioinformatics 2022;(6):1214-1223
SARS-CoV-2 is a new RNA virus affecting humans and spreads extensively throughout the world since its first outbreak in December,2019.Whether the transmissibility and pathogenicity of SARS-CoV-2 in humans after zoonotic transfer are actively evolving,and driven by adaptation to the new host and environments is still under debate.Understanding the evolutionary mechanism underlying epidemiological and pathological characteristics of COVID-19 is essential for predicting the epidemic trend,and providing guidance for disease control and treatments.Interrogating novel strategies for identifying natural selection using within-species polymorphisms and 3,674,076 SARS-CoV-2 genome sequences of 169 countries as of December 30,2021,we demonstrate with popula-tion genetic evidence that during the course of SARS-CoV-2 pandemic in humans,1)SARS-CoV-2 genomes are overall conserved under purifying selection,especially for the 14 genes related to viral RNA replication,transcription,and assembly;2)ongoing positive selection is actively driving the evolution of 6 genes(e.g.,S,ORF3a,and N)that play critical roles in molecular processes involving pathogen-host interactions,including viral invasion into and egress from host cells,and viral inhi-bition and evasion of host immune response,possibly leading to high transmissibility and mild symptom in SARS-CoV-2 evolution.According to an established haplotype phylogenetic relation-ship of 138 viral clusters,a spatial and temporal landscape of 556 critical mutations is constructed based on their divergence among viral haplotype clusters or repeatedly increase in frequency within at least 2 clusters,of which multiple mutations potentially conferring alterations in viral transmis-sibility,pathogenicity,and virulence of SARS-CoV-2 are highlighted,warranting attention.
2.Population Genetics of SARS-CoV-2:Disentangling Effects of Sampling Bias and Infection Clusters
Liu QI ; Zhao SHILEI ; Shi CHENG-MIN ; Song SHUHUI ; Zhu SIHUI ; Su YANKAI ; Zhao WENMING ; Li MINGKUN ; Bao YIMING ; Xue YONGBIAO ; Chen HUA
Genomics, Proteomics & Bioinformatics 2020;18(6):640-647
A novel RNA virus, the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), is responsible for the ongoing outbreak of coronavirus disease 2019 (COVID-19). Population genetic analysis could be useful for investigating the origin and evolutionary dynamics of COVID-19. However, due to extensive sampling bias and existence of infection clusters during the epidemic spread, direct applications of existing approaches can lead to biased parameter estima-tions and data misinterpretation. In this study, we first present robust estimator for the time to the most recent common ancestor (TMRCA) and the mutation rate, and then apply the approach to analyze 12,909 genomic sequences of SARS-CoV-2. The mutation rate is inferred to be 8.69 × 10-4 per site per year with a 95% confidence interval (CI) of [8.61 × 10-4, 8.77 × 10-4], and the TMRCA of the samples inferred to be Nov 28, 2019 with a 95% CI of [Oct 20, 2019, Dec 9, 2019]. The results indicate that COVID-19 might originate earlier than and outside of Wuhan Seafood Market. We further demonstrate that genetic polymorphism patterns, including the enrichment of specific haplotypes and the temporal allele frequency trajectories generated from infection clusters, are similar to those caused by evolutionary forces such as natural selection. Our results show that population genetic methods need to be developed to efficiently detangle the effects of sampling bias and infection clusters to gain insights into the evolutionary mechanism ofSARS-CoV-2. Software for implementing VirusMuT can be downloaded at https://bigd.big.ac.cn/biocode/tools/BT007081.
3.The Global Landscape of SARS-CoV-2 Genomes, Variants, and Haplotypes in 2019nCoVR
Song SHUHUI ; Ma LINA ; Zou DONG ; Tian DONGMEI ; Li CUIPING ; Zhu JUNWEI ; Chen MEILI ; Wang ANKE ; Ma YINGKE ; Li MENGWEI ; Teng XUFEI ; Cui YING ; Duan GUANGYA ; Zhang MOCHEN ; Jin TONG ; Shi CHENGMIN ; Du ZHENGLIN ; Zhang YADONG ; Liu CHUANDONG ; Li RUJIAO ; Zeng JINGYAO ; Hao LILI ; Jiang SHUAI ; Chen HUA ; Han DALI ; Xiao JINGFA ; Zhang ZHANG ; Zhao WENMING ; Xue YONGBIAO ; Bao YIMING
Genomics, Proteomics & Bioinformatics 2020;18(6):749-759
On January 22, 2020, China National Center for Bioinformation (CNCB) released the 2019 Novel Coronavirus Resource (2019nCoVR), an open-access information resource for the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). 2019nCoVR features a comprehensive integra-tion of sequence and clinical information for all publicly available SARS-CoV-2 isolates, which are manually curated with value-added annotations and quality evaluated by an automated in-house pipeline. Of particular note, 2019nCoVR offers systematic analyses to generate a dynamic landscape of SARS-CoV-2 genomic variations at a global scale. It provides all identified variants and their detailed statistics for each virus isolate, and congregates the quality score, functional annotation,and population frequency for each variant. Spatiotemporal change for each variant can be visualized and historical viral haplotype network maps for the course of the outbreak are also generated based on all complete and high-quality genomes available. Moreover, 2019nCoVR provides a full collection of SARS-CoV-2 relevant literature on the coronavirus disease 2019 (COVID-19), including published papers from PubMed as well as preprints from services such as bioRxiv and medRxiv through Europe PMC. Furthermore, by linking with relevant databases in CNCB, 2019nCoVR offers data submission services for raw sequence reads and assembled genomes, and data sharing with NCBI. Collectively, SARS-CoV-2 is updated daily to collect the latest information on genome sequences, variants, hap-lotypes, and literature for a timely reflection, making 2019nCoVR a valuable resource for the global research community. 2019nCoVR is accessible at https://bigd.big.ac.cn/ncov/.
4.Genomic Epidemiology of SARS-CoV-2 in Pakistan
Song SHUHUI ; Li CUIPING ; Kang LU ; Tian DONGMEI ; Badar NAZISH ; Ma WENTAI ; Zhao SHILEI ; Jiang XUAN ; Wang CHUN ; Sun YONGQIAO ; Li WENJIE ; Lei MENG ; Li SHUANGLI ; Qi QIUHUI ; Ikram AAMER ; Salman MUHAMMAD ; Umair MASSAB ; Shireen HUMA ; Batool FATIMA ; Zhang BING ; Chen HUA ; Yang YUN-GUI ; Abbasi Ali AMIR ; Li MINGKUN ; Xue YONGBIAO ; Bao YIMING
Genomics, Proteomics & Bioinformatics 2021;19(5):727-740
COVID-19 has swept globally and Pakistan is no exception.To investigate the initial introductions and transmissions of the SARS-CoV-2 in Pakistan,we performed the largest genomic epidemiology study of COVID-19 in Pakistan and generated 150 complete SARS-CoV-2 genome sequences from samples collected from March 16 to June 1,2020.We identified a total of 347 mutated positions,31 of which were over-represented in Pakistan.Meanwhile,we found over 1000 intra-host single-nucleotide variants(iSNVs).Several of them occurred concurrently,indicating possible interactions among them or coevolution.Some of the high-frequency iSNVs in Pakistan were not observed in the global population,suggesting strong purifying selections.The genomic epidemiology revealed five distinctive spreading clusters.The largest cluster consisted of 74 viruses which were derived from different geographic locations of Pakistan and formed a deep hierarchical structure,indicating an extensive and persistent nation-wide transmission of the virus that was probably attributed to a signature mutation(G8371T in ORF 1ab)of this cluster.Further-more,28 putative international introductions were identified,several of which are consistent with the epidemiological investigations.In all,this study has inferred the possible pathways of introduc-tions and transmissions of SARS-CoV-2 in Pakistan,which could aid ongoing and future viral surveillance and COVID-19 control.