1.Mako:A Graph-based Pattern Growth Approach to Detect Complex Structural Variants
Lin JIADONG ; Yang XIAOFEI ; Kosters WALTER ; Xu TUN ; Jia YANYAN ; Wang SONGBO ; Zhu QIHUI ; Ryan MALLORY ; Guo LI ; Zhang CHENGSHENG ; The Human Genome Structural Variation Consortium ; Lee CHARLES ; E.Devine SCOTT ; E.Eichler EVAN ; Ye KAI
Genomics, Proteomics & Bioinformatics 2022;20(1):205-218
Complex structural variants(CSVs)are genomic alterations that have more than two breakpoints and are considered as the simultaneous occurrence of simple structural variants.How-ever,detecting the compounded mutational signals of CSVs is challenging through a commonly used model-match strategy.As a result,there has been limited progress for CSV discovery com-pared with simple structural variants.Here,we systematically analyzed the multi-breakpoint con-nection feature of CSVs,and proposed Mako,utilizing a bottom-up guided model-free strategy,to detect CSVs from paired-end short-read sequencing.Specifically,we implemented a graph-based pattern growth approach,where the graph depicts potential breakpoint connections,and pattern growth enables CSV detection without pre-defined models.Comprehensive evaluations on both simulated and real datasets revealed that Mako outperformed other algorithms.Notably,validation rates of CSVs on real data based on experimental and computational validations as well as manual inspections are around 70%,where the medians of experimental and computational breakpoint shift are 13 bp and 26 bp,respectively.Moreover,the Mako CSV subgraph effectively characterized the breakpoint connections of a CSV event and uncovered a total of 15 CSV types,including two novel types of adjacent segment swap and tandem dispersed duplication.Further analysis of these CSVs also revealed the impact of sequence homology on the formation of CSVs.Mako is publicly available at https://github.com/xjtu-omics/Mako.
2.JAX-CNV:A Whole-genome Sequencing-based Algorithm for Copy Number Detection at Clinical Grade Level
Lee WAN-PING ; Zhu QIHUI ; Yang XIAOFEI ; Liu SILVIA ; Cerveira ELIZA ; Ryan MALLORY ; Mil-Homens ADAM ; Bellfy LAUREN ; Ye KAI ; Lee CHARLES ; Zhang CHENGSHENG
Genomics, Proteomics & Bioinformatics 2022;(6):1197-1206
We aimed to develop a whole-genome sequencing(WGS)-based copy number variant(CNV)calling algorithm with the potential of replacing chromosomal microarray assay(CMA)for clinical diagnosis.JAX-CNV is thus developed for CNV detection from WGS data.The perfor-mance of this CNV calling algorithm was evaluated in a blinded manner on 31 samples and com-pared to the 112 CNVs reported by clinically validated CMAs for these 31 samples.The result showed that JAX-CNV recalled 100%of these CNVs.Besides,JAX-CNV identified an average of 30 CNVs per individual,representing an approximately seven-fold increase compared to calls of clinically validated CMAs.Experimental validation of 24 randomly selected CNVs showed one false positive,i.e.,a false discovery rate(FDR)of 4.17%.A robustness test on lower-coverage data revealed a 100%sensitivity for CNVs larger than 300 kb(the current threshold for College of American Pathologists)down to 10×coverage.For CNVs larger than 50 kb,sensi-tivities were 100%for coverages deeper than 20×,97%for 15×,and 95%for 10×.We developed a WGS-based CNV pipeline,including this newly developed CNV caller JAX-CNV,and found it capable of detecting CMA-reported CNVs at a sensitivity of 100%with about a FDR of 4%.We propose that JAX-CNV could be further examined in a multi-institutional study to justify the transition of first-tier genetic testing from CMAs to WGS.JAX-CNV is available at https://github.com/The J acksonLaboratory/JAX-CNV.