1.A statistical approach designed for finding mathematically defined repeats in shotgun data and determining the length distribution of clone-inserts.
Lan ZHONG ; Kunlin ZHANG ; Xiangang HUANG ; Peixiang NI ; Yujun HAN ; Kai WANG ; Jun WANG ; Songgang LI
Genomics, Proteomics & Bioinformatics 2003;1(1):43-51
The large amount of repeats, especially high copy repeats, in the genomes of higher animals and plants makes whole genome assembly (WGA) quite difficult. In order to solve this problem, we tried to identify repeats and mask them prior to assembly even at the stage of genome survey. It is known that repeats of different copy number have different probabilities of appearance in shotgun data, so based on this principle, we constructed a statistical model and inferred criteria for mathematically defined repeats (MDRs) at different shotgun coverages. According to these criteria, we developed software MDRmasker to identify and mask MDRs in shotgun data. With repeats masked prior to assembly, the speed of assembly was increased with lower error probability. In addition, clone-insert size affect the accuracy of repeat assembly and scaffold construction, we also designed length distribution of clone-inserts using our model. In our simulated genomes of human and rice, the length distribution of repeats is different, so their optimal length distributions of clone-inserts were not the same. Thus with optimal length distribution of clone-inserts, a given genome could be assembled better at lower coverage.
Animals
;
Cloning, Molecular
;
Genome
;
Genome, Human
;
Genomics
;
methods
;
Humans
;
Models, Genetic
;
Models, Statistical
;
Models, Theoretical
;
Oryza
;
genetics
;
Sequence Analysis, DNA
2.Evolutionary Transients in the Rice Transcriptome
Wang JUN ; Zhang JIANGUO ; Li RUIQIANG ; Zheng HONGKUN ; Li JUN ; Zhang YONG ; Li HENG ; Ni PEIXIANG ; Li SONGGANG ; Li SHENGTING ; Wang JINGQIANG ; Liu DONGYUAN ; McDermott JASON ; Samudrala RAM ; Liu SIQI ; Wang JIAN ; Yang HUANMING ; Yu JUN ; Wong Ka-Shu GANE
Genomics, Proteomics & Bioinformatics 2010;08(4):211-228
In the canonical version of evolution by gene duplication,one copy is kept unaltered while the other is free to evolve.This process of evolutionary experimentation can persist for millions of years.Since it is so short lived in comparison to the lifetime of the core genes that make up the majority of most genomes,a substantial fraction of the genome and the transcriptome may-in principle-be attributable to what we will refer to as "evolutionarytransients",referring here to both the process and the genes that have gone or are undergoing this process.Using the rice gene set as a test case,we argue that this phenomenon goes a long way towards explaining why there are so many more rice genes than Arabidopsis genes,and why most excess rice genes show low similarity to eudicots.