1.MultiKano: an automatic cell type annotation tool for single-cell multi-omics data based on Kolmogorov-Arnold network and data augmentation.
Siyu LI ; Xinhao ZHUANG ; Songbo JIA ; Songming TANG ; Liming YAN ; Heyang HUA ; Yuhang JIA ; Xuelin ZHANG ; Yan ZHANG ; Qingzhu YANG ; Shengquan CHEN
Protein & Cell 2025;16(5):374-380
2.High-quality Arabidopsis thaliana Genome Assembly with Nanopore and HiFi Long Reads
Wang BO ; Yang XIAOFEI ; Jia YANYAN ; Xu YU ; Jia PENG ; Dang NINGXIN ; Wang SONGBO ; Xu TUN ; Zhao XIXI ; Gao SHENGHAN ; Dong QUANBIN ; Ye KAI
Genomics, Proteomics & Bioinformatics 2022;20(1):4-13
Arabidopsis thaliana is an important and long-established model species for plant molec-ular biology,genetics,epigenetics,and genomics.However,the latest version of reference genome still contains a significant number of missing segments.Here,we reported a high-quality and almost complete Col-0 genome assembly with two gaps(named Col-XJTU)by combining the Oxford Nanopore Technologies ultra-long reads,Pacific Biosciences high-fidelity long reads,and Hi-C data.The total genome assembly size is 133,725,193 bp,introducing 14.6 Mb of novel sequences compared to the TAIR1 0.1 reference genome.All five chromosomes of the Col-XJTU assembly are highly accurate with consensus quality(QV)scores>60(ranging from 62 to 68),which are higher than those of the TAIR10.1 reference(ranging from 45 to 52).We completely resolved chro-mosome(Chr)3 and Chr5 in a telomere-to-telomere manner.Chr4 was completely resolved except the nucleolar organizing regions,which comprise long repetitive DNA fragments.The Chr1 cen-tromere(CEN1),reportedly around 9 Mb in length,is particularly challenging to assemble due to the presence of tens of thousands of CEN180 satellite repeats.Using the cutting-edge sequencing data and novel computational approaches,we assembled a 3.8-Mb-long CEN1 and a 3.5-Mb-long CEN2.We also investigated the structure and epigenetics of centromeres.Four clusters of CEN180 monomers were detected,and the centromere-specific histone H3-like protein(CENH3)exhibited a strong preference for CEN 180 Cluster 3.Moreover,we observed hypomethylation patterns in CENH3-enriched regions.We believe that this high-quality genome assembly,Col-XJTU,would serve as a valuable reference to better understand the global pattern of centromeric polymorphisms,as well as the genetic and epigenetic features in plants.
3.Mako:A Graph-based Pattern Growth Approach to Detect Complex Structural Variants
Lin JIADONG ; Yang XIAOFEI ; Kosters WALTER ; Xu TUN ; Jia YANYAN ; Wang SONGBO ; Zhu QIHUI ; Ryan MALLORY ; Guo LI ; Zhang CHENGSHENG ; The Human Genome Structural Variation Consortium ; Lee CHARLES ; E.Devine SCOTT ; E.Eichler EVAN ; Ye KAI
Genomics, Proteomics & Bioinformatics 2022;20(1):205-218
Complex structural variants(CSVs)are genomic alterations that have more than two breakpoints and are considered as the simultaneous occurrence of simple structural variants.How-ever,detecting the compounded mutational signals of CSVs is challenging through a commonly used model-match strategy.As a result,there has been limited progress for CSV discovery com-pared with simple structural variants.Here,we systematically analyzed the multi-breakpoint con-nection feature of CSVs,and proposed Mako,utilizing a bottom-up guided model-free strategy,to detect CSVs from paired-end short-read sequencing.Specifically,we implemented a graph-based pattern growth approach,where the graph depicts potential breakpoint connections,and pattern growth enables CSV detection without pre-defined models.Comprehensive evaluations on both simulated and real datasets revealed that Mako outperformed other algorithms.Notably,validation rates of CSVs on real data based on experimental and computational validations as well as manual inspections are around 70%,where the medians of experimental and computational breakpoint shift are 13 bp and 26 bp,respectively.Moreover,the Mako CSV subgraph effectively characterized the breakpoint connections of a CSV event and uncovered a total of 15 CSV types,including two novel types of adjacent segment swap and tandem dispersed duplication.Further analysis of these CSVs also revealed the impact of sequence homology on the formation of CSVs.Mako is publicly available at https://github.com/xjtu-omics/Mako.

Result Analysis
Print
Save
E-mail