1.Web-Based Database and Viewer of East Asian Copy Number Variations.
Ji Hong KIM ; Hae Jin HU ; Yeun Jun CHUNG
Genomics & Informatics 2012;10(1):65-67
We have discovered copy number variations (CNVs) in 3,578 Korean individuals with the Affymetrix Genome-Wide SNP array 5.0, and 4,003 copy number variation regions (CNVRs) were defined in a previous study. To explore the details of the variants easily in related studies, we built a database, cataloging the CNVs and related information. This system helps researchers browsing these variants with gene and structure variant annotations. Users can easily find specific regions with search options and verify them from system-integrated genome browsers with annotations.
Asian Continental Ancestry Group
;
Cataloging
;
Coat Protein Complex I
;
Genome
;
Humans
2.Comparative Evaluation of Intron Prediction Methods and Detection of Plant Genome Annotation Using Intron Length Distributions.
Genomics & Informatics 2012;10(1):58-64
Intron prediction is an important problem of the constantly updated genome annotation. Using two model plant (rice and Arabidopsis) genomes, we compared two well-known intron prediction tools: the Blast-Like Alignment Tool (BLAT) and Sim4cc. The results showed that each of the tools had its own advantages and disadvantages. BLAT predicted more than 99% introns of whole genomic introns with a small number of false-positive introns. Sim4cc was successful at finding the correct introns with a false-negative rate of 1.02% to 4.85%, and it needed a longer run time than BLAT. Further, we evaluated the intron information of 10 complete plant genomes. As non-coding sequences, intron lengths are not limited by a triplet codon frame; so, intron lengths have three phases: a multiple of three bases (3n), a multiple of three bases plus one (3n + 1), and a multiple of three bases plus two (3n + 2). It was widely accepted that the percentages of the 3n, 3n + 1, and 3n + 2 introns were quite similar in genomes. Our studies showed that 80% (8/10) of species were similar in terms of the number of three phases. The percentages of 3n introns in Ostreococcus lucimarinus was excessive (47.7%), while in Ostreococcus tauri, it was deficient (29.1%). This discrepancy could have been the result of errors in intron prediction. It is suggested that a three-phase evaluation is a fast and effective method of detecting intron annotation problems.
Codon
;
Genome
;
Genome, Plant
;
Humans
;
Introns
;
Plants
;
Triplets
3.An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases.
Md Rezaul KARIM ; Md Mamunur RASHID ; Byeong Soo JEONG ; Ho Jin CHOI
Genomics & Informatics 2012;10(1):51-57
Mining interesting patterns from DNA sequences is one of the most challenging tasks in bioinformatics and computational biology. Maximal contiguous frequent patterns are preferable for expressing the function and structure of DNA sequences and hence can capture the common data characteristics among related sequences. Biologists are interested in finding frequent orderly arrangements of motifs that are responsible for similar expression of a group of genes. In order to reduce mining time and complexity, however, most existing sequence mining algorithms either focus on finding short DNA sequences or require explicit specification of sequence lengths in advance. The challenge is to find longer sequences without specifying sequence lengths in advance. In this paper, we propose an efficient approach to mining maximal contiguous frequent patterns from large DNA sequence datasets. The experimental results show that our proposed approach is memory-efficient and mines maximal contiguous frequent patterns within a reasonable time.
Base Sequence
;
Computational Biology
;
Databases, Nucleic Acid
;
DNA
;
Mining
4.Efficient Mining of Interesting Patterns in Large Biological Sequences.
Md Mamunur RASHID ; Md Rezaul KARIM ; Byeong Soo JEONG ; Ho Jin CHOI
Genomics & Informatics 2012;10(1):44-50
Pattern discovery in biological sequences (e.g., DNA sequences) is one of the most challenging tasks in computational biology and bioinformatics. So far, in most approaches, the number of occurrences is a major measure of determining whether a pattern is interesting or not. In computational biology, however, a pattern that is not frequent may still be considered very informative if its actual support frequency exceeds the prior expectation by a large margin. In this paper, we propose a new interesting measure that can provide meaningful biological information. We also propose an efficient index-based method for mining such interesting patterns. Experimental results show that our approach can find interesting patterns within an acceptable computation time.
Base Sequence
;
Computational Biology
;
DNA
;
Mining
5.Decreases in Casz1 mRNA by an siRNA Complex Do not Alter Blood Pressure in Mice.
Su Min JI ; Young Bin SHIN ; So Yon PARK ; Hyeon Ju LEE ; Bermseok OH
Genomics & Informatics 2012;10(1):40-43
Recent genomewide association studies of large samples have identified genes that are associated with blood pressure. The Global Blood Pressure Genetics (Global BPgen) and Cohorts for Heart and Aging Research in Genome Epidemiology (CHARGE) consortiums identified 14 loci that govern blood pressure on a genomewide significance level, one of which is CASZ1 confirmed in both Europeans and Asians. CASZ1 is a zinc finger transcription factor that controls apoptosis and cell fate and suppresses neuroblastoma tumor growth by reprogramming gene expression, like a tumor suppressor. To validate the function of CASZ1 in blood pressure, we decreased Casz1 mRNA levels in mice by siRNA. Casz1 siRNA reduced mRNA levels by 59% in a mouse cell line. A polyethylenimine-mixed siRNA complex was injected into mouse tail veins, reducing Casz1 mRNA expression to 45% in the kidney. However, blood pressure in the treated mice was unaffected, despite a 55% reduction in Casz1 mRNA levels in the kidney on multiple siRNA injections daily. Even though Casz1 siRNA-treated mice did not experience any significant change in blood pressure, our study demonstrates the value of in vivo siRNA injection in analyzing the function of candidate genes identified by genomewide association studies.
Aging
;
Animals
;
Apoptosis
;
Asian Continental Ancestry Group
;
Blood Pressure
;
Cell Line
;
Cohort Studies
;
Gene Expression
;
Genome
;
Heart
;
Humans
;
Kidney
;
Mice
;
Neuroblastoma
;
RNA, Messenger
;
RNA, Small Interfering
;
Transcription Factors
;
Veins
;
Zinc Fingers
6.CaGe: A Web-Based Cancer Gene Annotation System for Cancer Genomics.
Young Kyu PARK ; Tae Wook KANG ; Su Jin BAEK ; Kwon Il KIM ; Seon Young KIM ; Doheon LEE ; Yong Sung KIM
Genomics & Informatics 2012;10(1):33-39
High-throughput genomic technologies (HGTs), including next-generation DNA sequencing (NGS), microarray, and serial analysis of gene expression (SAGE), have become effective experimental tools for cancer genomics to identify cancer-associated somatic genomic alterations and genes. The main hurdle in cancer genomics is to identify the real causative mutations or genes out of many candidates from an HGT-based cancer genomic analysis. One useful approach is to refer to known cancer genes and associated information. The list of known cancer genes can be used to determine candidates of cancer driver mutations, while cancer gene-related information, including gene expression, protein-protein interaction, and pathways, can be useful for scoring novel candidates. Some cancer gene or mutation databases exist for this purpose, but few specialized tools exist for an automated analysis of a long gene list from an HGT-based cancer genomic analysis. This report presents a new web-accessible bioinformatic tool, called CaGe, a cancer genome annotation system for the assessment of candidates of cancer genes from HGT-based cancer genomics. The tool provides users with information on cancer-related genes, mutations, pathways, and associated annotations through annotation and browsing functions. With this tool, researchers can classify their candidate genes from cancer genome studies into either previously reported or novel categories of cancer genes and gain insight into underlying carcinogenic mechanisms through a pathway analysis. We show the usefulness of CaGe by assessing its performance in annotating somatic mutations from a published small cell lung cancer study.
Gene Expression
;
Genes, Neoplasm
;
Genome
;
Genomics
;
Sequence Analysis, DNA
;
Small Cell Lung Carcinoma
7.Possibility of the Use of Public Microarray Database for Identifying Significant Genes Associated with Oral Squamous Cell Carcinoma.
Genomics & Informatics 2012;10(1):23-32
There are lots of studies attempting to identify the expression changes in oral squamous cell carcinoma. Most studies include insufficient samples to apply statistical methods for detecting significant gene sets. This study combined two small microarray datasets from a public database and identified significant genes associated with the progress of oral squamous cell carcinoma. There were different expression scales between the two datasets, even though these datasets were generated under the same platforms - Affymetrix U133A gene chips. We discretized gene expressions of the two datasets by adjusting the differences between the datasets for detecting the more reliable information. From the combination of the two datasets, we detected 51 significant genes that were upregulated in oral squamous cell carcinoma. Most of them were published in previous studies as cancer-related genes. From these selected genes, significant genetic pathways associated with expression changes were identified. By combining several datasets from the public database, sufficient samples can be obtained for detecting reliable information. Most of the selected genes were known as cancer-related genes, including oral squamous cell carcinoma. Several unknown genes can be biologically evaluated in further studies.
Carcinoma, Squamous Cell
;
Gene Expression
;
Oligonucleotide Array Sequence Analysis
;
Weights and Measures
8.Differential Expression of PKD2-Associated Genes in Autosomal Dominant Polycystic Kidney Disease.
Yeon Joo YOOK ; Yu Mi WOO ; Moon Hee YANG ; Je Yeong KO ; Bo Hye KIM ; Eun Ji LEE ; Eun Sun CHANG ; Min Joo LEE ; Sunyoung LEE ; Jong Hoon PARK
Genomics & Informatics 2012;10(1):16-22
Autosomal dominant polycystic kidney disease (ADPKD) is characterized by formation of multiple fluid-filled cysts that expand over time and destroy renal architecture. The proteins encoded by the PKD1 and PKD2 genes, mutations in which account for nearly all cases of ADPKD, may help guard against cystogenesis. Previously developed mouse models of PKD1 and PKD2 demonstrated an embryonic lethal phenotype and massive cyst formation in the kidney, indicating that PKD1 and PKD2 probably play important roles during normal renal tubular development. However, their precise role in development and the cellular mechanisms of cyst formation induced by PKD1 and PKD2 mutations are not fully understood. To address this question, we presently created Pkd2 knockout and PKD2 transgenic mouse embryo fibroblasts. We used a mouse oligonucleotide microarray to identify messenger RNAs whose expression was altered by the overexpression of the PKD2 or knockout of the Pkd2. The majority of identified mutations was involved in critical biological processes, such as metabolism, transcription, cell adhesion, cell cycle, and signal transduction. Herein, we confirmed differential expressions of several genes including aquaporin-1, according to different PKD2 expression levels in ADPKD mouse models, through microarray analysis. These data may be helpful in PKD2-related mechanisms of ADPKD pathogenesis.
Animals
;
Biological Processes
;
Cell Adhesion
;
Cell Cycle
;
Embryonic Structures
;
Fibroblasts
;
Kidney
;
Mice
;
Mice, Transgenic
;
Microarray Analysis
;
Oligonucleotide Array Sequence Analysis
;
Phenotype
;
Polycystic Kidney Diseases
;
Polycystic Kidney, Autosomal Dominant
;
Proteins
;
RNA, Messenger
;
Signal Transduction
9.CysQ of Cryptosporidium parvum, a Protozoa, May Have Been Acquired from Bacteria by Horizontal Gene Transfer.
Genomics & Informatics 2012;10(1):9-15
Horizontal gene transfer (HGT) is the movement of genetic material between kingdoms and is considered to play a positive role in adaptation. Cryptosporidium parvum is a parasitic protozoan that causes an infectious disease. Its genome sequencing reported 14 bacteria-like proteins in the nuclear genome. Among them, cgd2_1810, which has been annotated as CysQ, a sulfite synthesis pathway protein, is listed as one of the candidates of genes horizontally transferred from bacterial origin. In this report, we examined this issue using phylogenetic analysis. Our BLAST search showed that C. parvum CysQ protein had the highest similarity with that of proteobacteria. Analysis with NCBI's Conserved Domain Tree showed phylogenetic incongruence, in that C. parvum CysQ protein was located within a branch of proteobacteria in the cd01638 domain, a bacterial member of the inositol monophosphatase family. According to Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway, the sulfate assimilation pathway, where CysQ plays an important role, is well conserved in most eukaryotes as well as prokaryotes. However, the Apicomplexa, including C. parvum, largely lack orthologous genes of the pathway, suggesting its loss in those protozoan lineages. Therefore, we conclude that C. parvum regained cysQ from proteobacteria by HGT, although its functional role is elusive.
Apicomplexa
;
Bacteria
;
Communicable Diseases
;
Cryptosporidium
;
Cryptosporidium parvum
;
Eukaryota
;
Gene Transfer, Horizontal
;
Genome
;
Humans
;
Inositol
;
Phosphoric Monoester Hydrolases
;
Proteins
;
Proteobacteria
10.Survey of the Applications of NGS to Whole-Genome Sequencing and Expression Profiling.
Jong Sung LIM ; Beom Soon CHOI ; Jeong Soo LEE ; Chanseok SHIN ; Tae Jin YANG ; Jae Sung RHEE ; Jae Seong LEE ; Ik Young CHOI
Genomics & Informatics 2012;10(1):1-8
Recently, the technologies of DNA sequence variation and gene expression profiling have been used widely as approaches in the expertise of genome biology and genetics. The application to genome study has been particularly developed with the introduction of the next-generation DNA sequencer (NGS) Roche/454 and Illumina/Solexa systems, along with bioinformation analysis technologies of whole-genome de novo assembly, expression profiling, DNA variation discovery, and genotyping. Both massive whole-genome shotgun paired-end sequencing and mate paired-end sequencing data are important steps for constructing de novo assembly of novel genome sequencing data. It is necessary to have DNA sequence information from a multiplatform NGS with at least 2x and 30x depth sequence of genome coverage using Roche/454 and Illumina/Solexa, respectively, for effective an way of de novo assembly. Massive short-length reading data from the Illumina/Solexa system is enough to discover DNA variation, resulting in reducing the cost of DNA sequencing. Whole-genome expression profile data are useful to approach genome system biology with quantification of expressed RNAs from a whole-genome transcriptome, depending on the tissue samples. The hybrid mRNA sequences from Rohce/454 and Illumina/Solexa are more powerful to find novel genes through de novo assembly in any whole-genome sequenced species. The 20x and 50x coverage of the estimated transcriptome sequences using Roche/454 and Illumina/Solexa, respectively, is effective to create novel expressed reference sequences. However, only an average 30x coverage of a transcriptome with short read sequences of Illumina/Solexa is enough to check expression quantification, compared to the reference expressed sequence tag sequence.
Base Sequence
;
Biology
;
Chimera
;
DNA
;
DNA Fingerprinting
;
Expressed Sequence Tags
;
Gene Expression Profiling
;
Genome
;
RNA
;
RNA, Messenger
;
Sequence Analysis, DNA
;
Transcriptome