SeqSQC: A Bioconductor Package for Evaluating the Sample Quality of Next-generation Sequencing Data.
10.1016/j.gpb.2018.07.006
- Author:
Qian LIU
1
,
2
;
Qiang HU
3
;
Song YAO
4
;
Marilyn L KWAN
5
;
Janise M ROH
5
;
Hua ZHAO
6
;
Christine B AMBROSONE
4
;
Lawrence H KUSHI
5
;
Song LIU
3
;
Qianqian ZHU
7
Author Information
1. Department of Biostatistics, University at Buffalo, SUNY, Buffalo NY14260, USA
2. Department of Biostatistics and Bioinformatics, Roswell Park Comprehensive Cancer Center, Buffalo NY14263, USA. Electronic address: qliu7@buffalo.edu.
3. Department of Biostatistics and Bioinformatics, Roswell Park Comprehensive Cancer Center, Buffalo NY14263, USA.
4. Department of Cancer Prevention and Control, Roswell Park Comprehensive Cancer Center, Buffalo NY14263, USA.
5. Division of Research, Kaiser Permanente Northern California, Oakland CA94612, USA.
6. Department of Epidemiology, The University of Texas MD Anderson Cancer Center, Houston TX77030, USA.
7. Department of Biostatistics and Bioinformatics, Roswell Park Comprehensive Cancer Center, Buffalo NY14263, USA. Electronic address: Qianqian.Zhu@roswellpark.org.
- Publication Type:Journal Article
- Keywords:
1000 Genomes Project;
Bioconductor package;
Next-generation sequencing;
Quality assessment;
Whole-exome sequencing
- MeSH:
Breast Neoplasms;
genetics;
Cohort Studies;
Continental Population Groups;
genetics;
Female;
Genome, Human;
High-Throughput Nucleotide Sequencing;
methods;
standards;
Humans;
Software;
Whole Exome Sequencing
- From:
Genomics, Proteomics & Bioinformatics
2019;17(2):211-218
- CountryChina
- Language:English
-
Abstract:
As next-generation sequencing (NGS) technology has become widely used to identify genetic causal variants for various diseases and traits, a number of packages for checking NGS data quality have sprung up in public domains. In addition to the quality of sequencing data, sample quality issues, such as gender mismatch, abnormal inbreeding coefficient, cryptic relatedness, and population outliers, can also have fundamental impact on downstream analysis. However, there is a lack of tools specialized in identifying problematic samples from NGS data, often due to the limitation of sample size and variant counts. We developed SeqSQC, a Bioconductor package, to automate and accelerate sample cleaning in NGS data of any scale. SeqSQC is designed for efficient data storage and access, and equipped with interactive plots for intuitive data visualization to expedite the identification of problematic samples. SeqSQC is available at http://bioconductor.org/packages/SeqSQC.