RGAAT: A Reference-based Genome Assembly and Annotation Tool for New Genomes and Upgrade of Known Genomes.
10.1016/j.gpb.2018.03.006
- Author:
Wanfei LIU
1
,
2
,
3
;
Shuangyang WU
1
,
4
;
Qiang LIN
1
,
5
;
Shenghan GAO
6
;
Feng DING
7
;
Xiaowei ZHANG
6
;
Hasan Awad ALJOHI
8
;
Jun YU
1
,
9
;
Songnian HU
1
,
10
Author Information
1. CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
2. Joint Center for Genomics Research (JCGR), King Abdulaziz City for Science and Technology and Chinese Academy of Sciences, Riyadh 11442, Saudi Arabia
3. Grail Scientific Co. Ltd., Shenyang 110000, China.
4. University of Chinese Academy of Sciences, Beijing 100049, China.
5. Joint Center for Genomics Research (JCGR), King Abdulaziz City for Science and Technology and Chinese Academy of Sciences, Riyadh 11442, Saudi Arabia.
6. CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.
7. Shenzhen Institute of Geriatrics, Shenzhen 518020, China.
8. Joint Center for Genomics Research (JCGR), King Abdulaziz City for Science and Technology and Chinese Academy of Sciences, Riyadh 11442, Saudi Arabia. Electronic address: haljohi@kacst.edu.sa.
9. Joint Center for Genomics Research (JCGR), King Abdulaziz City for Science and Technology and Chinese Academy of Sciences, Riyadh 11442, Saudi Arabia. Electronic address: junyu@big.ac.cn.
10. Joint Center for Genomics Research (JCGR), King Abdulaziz City for Science and Technology and Chinese Academy of Sciences, Riyadh 11442, Saudi Arabia. Electronic address: husn@big.ac.cn.
- Publication Type:Journal Article
- Keywords:
Genome annotation;
Genome assembly;
Genome comparison;
Variant identification
- MeSH:
Genome;
Genomics;
High-Throughput Nucleotide Sequencing;
methods;
standards;
Humans;
Reference Standards;
Sequence Analysis, DNA;
methods;
standards;
Software
- From:
Genomics, Proteomics & Bioinformatics
2018;16(5):373-381
- CountryChina
- Language:English
-
Abstract:
The rapid development of high-throughput sequencing technologies has led to a dramatic decrease in the money and time required for de novo genome sequencing or genome resequencing projects, with new genome sequences constantly released every week. Among such projects, the plethora of updated genome assemblies induces the requirement of version-dependent annotation files and other compatible public dataset for downstream analysis. To handle these tasks in an efficient manner, we developed the reference-based genome assembly and annotation tool (RGAAT), a flexible toolkit for resequencing-based consensus building and annotation update. RGAAT can detect sequence variants with comparable precision, specificity, and sensitivity to GATK and with higher precision and specificity than Freebayes and SAMtools on four DNA-seq datasets tested in this study. RGAAT can also identify sequence variants based on cross-cultivar or cross-version genomic alignments. Unlike GATK and SAMtools/BCFtools, RGAAT builds the consensus sequence by taking into account the true allele frequency. Finally, RGAAT generates a coordinate conversion file between the reference and query genomes using sequence variants and supports annotation file transfer. Compared to the rapid annotation transfer tool (RATT), RGAAT displays better performance characteristics for annotation transfer between different genome assemblies, strains, and species. In addition, RGAAT can be used for genome modification, genome comparison, and coordinate conversion. RGAAT is available at https://sourceforge.net/projects/rgaat/ and https://github.com/wushyer/RGAAT_v2 at no cost.