1.The Global Landscape of SARS-CoV-2 Genomes, Variants, and Haplotypes in 2019nCoVR
Song SHUHUI ; Ma LINA ; Zou DONG ; Tian DONGMEI ; Li CUIPING ; Zhu JUNWEI ; Chen MEILI ; Wang ANKE ; Ma YINGKE ; Li MENGWEI ; Teng XUFEI ; Cui YING ; Duan GUANGYA ; Zhang MOCHEN ; Jin TONG ; Shi CHENGMIN ; Du ZHENGLIN ; Zhang YADONG ; Liu CHUANDONG ; Li RUJIAO ; Zeng JINGYAO ; Hao LILI ; Jiang SHUAI ; Chen HUA ; Han DALI ; Xiao JINGFA ; Zhang ZHANG ; Zhao WENMING ; Xue YONGBIAO ; Bao YIMING
Genomics, Proteomics & Bioinformatics 2020;18(6):749-759
On January 22, 2020, China National Center for Bioinformation (CNCB) released the 2019 Novel Coronavirus Resource (2019nCoVR), an open-access information resource for the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). 2019nCoVR features a comprehensive integra-tion of sequence and clinical information for all publicly available SARS-CoV-2 isolates, which are manually curated with value-added annotations and quality evaluated by an automated in-house pipeline. Of particular note, 2019nCoVR offers systematic analyses to generate a dynamic landscape of SARS-CoV-2 genomic variations at a global scale. It provides all identified variants and their detailed statistics for each virus isolate, and congregates the quality score, functional annotation,and population frequency for each variant. Spatiotemporal change for each variant can be visualized and historical viral haplotype network maps for the course of the outbreak are also generated based on all complete and high-quality genomes available. Moreover, 2019nCoVR provides a full collection of SARS-CoV-2 relevant literature on the coronavirus disease 2019 (COVID-19), including published papers from PubMed as well as preprints from services such as bioRxiv and medRxiv through Europe PMC. Furthermore, by linking with relevant databases in CNCB, 2019nCoVR offers data submission services for raw sequence reads and assembled genomes, and data sharing with NCBI. Collectively, SARS-CoV-2 is updated daily to collect the latest information on genome sequences, variants, hap-lotypes, and literature for a timely reflection, making 2019nCoVR a valuable resource for the global research community. 2019nCoVR is accessible at https://bigd.big.ac.cn/ncov/.
2.Whole Genome Analyses of Chinese Population and De Novo Assembly of A Northern Han Genome.
Zhenglin DU ; Liang MA ; Hongzhu QU ; Wei CHEN ; Bing ZHANG ; Xi LU ; Weibo ZHAI ; Xin SHENG ; Yongqiao SUN ; Wenjie LI ; Meng LEI ; Qiuhui QI ; Na YUAN ; Shuo SHI ; Jingyao ZENG ; Jinyue WANG ; Yadong YANG ; Qi LIU ; Yaqiang HONG ; Lili DONG ; Zhewen ZHANG ; Dong ZOU ; Yanqing WANG ; Shuhui SONG ; Fan LIU ; Xiangdong FANG ; Hua CHEN ; Xin LIU ; Jingfa XIAO ; Changqing ZENG
Genomics, Proteomics & Bioinformatics 2019;17(3):229-247
To unravel the genetic mechanisms of disease and physiological traits, it requires comprehensive sequencing analysis of large sample size in Chinese populations. Here, we report the primary results of the Chinese Academy of Sciences Precision Medicine Initiative (CASPMI) project launched by the Chinese Academy of Sciences, including the de novo assembly of a northern Han reference genome (NH1.0) and whole genome analyses of 597 healthy people coming from most areas in China. Given the two existing reference genomes for Han Chinese (YH and HX1) were both from the south, we constructed NH1.0, a new reference genome from a northern individual, by combining the sequencing strategies of PacBio, 10× Genomics, and Bionano mapping. Using this integrated approach, we obtained an N50 scaffold size of 46.63 Mb for the NH1.0 genome and performed a comparative genome analysis of NH1.0 with YH and HX1. In order to generate a genomic variation map of Chinese populations, we performed the whole-genome sequencing of 597 participants and identified 24.85 million (M) single nucleotide variants (SNVs), 3.85 M small indels, and 106,382 structural variations. In the association analysis with collected phenotypes, we found that the T allele of rs1549293 in KAT8 significantly correlated with the waist circumference in northern Han males. Moreover, significant genetic diversity in MTHFR, TCN2, FADS1, and FADS2, which associate with circulating folate, vitamin B12, or lipid metabolism, was observed between northerners and southerners. Especially, for the homocysteine-increasing allele of rs1801133 (MTHFR 677T), we hypothesize that there exists a "comfort" zone for a high frequency of 677T between latitudes of 35-45 degree North. Taken together, our results provide a high-quality northern Han reference genome and novel population-specific data sets of genetic variants for use in the personalized and precision medicine.