1.Similarity Estimation Between DNA Sequences Based on Local Pattern Histograms of Binary Images
Genomics, Proteomics & Bioinformatics 2016;14(2):103-112
Graphical representation of DNA sequences is one of the most popular techniques for alignment-free sequence comparison. Here, we propose a new method for the feature extraction of DNA sequences represented by binary images, by estimating the similarity between DNA sequences using the frequency histograms of local bitmap patterns of images. Our method shows linear time complexity for the length of DNA sequences, which is practical even when long sequences, such as whole genome sequences, are compared. We tested five distance measures for the estimation of sequence similarities, and found that the histogram intersection and Manhattan distance are the most appropriate ones for phylogenetic analyses.
2.Significant Deviations in the Configurations of Homologous Tandem Repeats in Prokaryotic Genomes
Hirayama SHINTARO ; Mizuta SATOSHI
Genomics, Proteomics & Bioinformatics 2009;7(4):163-174
We explored the possibilities of whole-genome duplication (WGD) in prokaryotic species,where we performed statistical analyses of the configurations of the central angles between homologous tandem repeats (TRs) on the circular chromosomes.At first,we detected TRs on their chromosomes and identified equivalent tandem repeat pairs (ETRPs); here,an ETRP is defined as a pair of tandem repeats sequentially similar to each other.Then we carried out statistical analyses of the central angle distributions of the detected ETRPs on each circular chromosome by way of comparisons between the detected distributions and those generated by null models.In the analyses,we estimated a P value by a simulation using the Kullback-Leibler divergence as a distance measure between two distributions.As a result,the central angle distributions for 8 out of the 203 prokaryotic species showed statistically significant deviations (P<0.05).In particular,we found out the characteristic feature of one round of WGD in Photorhabdus luminescens genome and that of two rounds of WGD in Escherichia coli K12.