1.Comparison of Erythrocyte Traits Among European, Japanese and Korean.
Genomics & Informatics 2010;8(3):159-163
Erythrocyte traits are heritable and indirect indicators of blood diseases caused by erythrocyte, but their genetic factors are largely unknown. So we performed genome-wide association study in 8,842 Korean individuals to identify genetic factors influencing erythrocyte traits. We identified 40 associations for three erythrocyte traits at genome-wide significance levels (p<1x10-6). We compared these associated loci with those reported in genome-wide association studies of European and Japanese. Our findings include previously identified loci (HBS1L-MYB, TMPRSS6, USP49 and CCND3) in other studies and novel associations (MRDS1/OFCC1, CSDE1, NRAS and 8 other loci). For example, SNP rs4895440 of HBS1L-MYB intergenic region on chromosome 6q23.3 is one of the most associations influencing erythrocyte traits (p=8.33x10-27).
Asian Continental Ancestry Group
;
DNA, Intergenic
;
Erythrocytes
;
Genome-Wide Association Study
;
Hematocrit
;
Hematologic Diseases
;
Hemoglobins
;
Humans
2.Application of Structural Equation Models to Genome-wide Association Analysis.
Jiyoung KIM ; Junghyun NAMKUNG ; Seungmook LEE ; Taesung PARK
Genomics & Informatics 2010;8(3):150-158
Genome-wise association studies (GWASs) have become popular approaches to identify genetic variants associated with human biological traits. In this study, we applied Structural Equation Models (SEMs) in order to model complex relationships between genetic networks and traits as risk factors. SEMs allow us to achieve a better understanding of biological mechanisms through identifying greater numbers of genes and pathways that are associated with a set of traits and the relationship among them. For efficient SEM analysis for GWASs, we developed a procedure, comprised of four stages. In the first stage, we conducted single-SNP analysis using regression models, where age, sex, and recruited area were included as adjusting covariates. In the second stage, Fisher's combination test was conducted for each gene to detect significant genes using p-values obtained from the single-SNP analysis. In the third stage, Fisher's exact test was adopted to determine which biological pathways were enriched with significant SNPs. Finally, based on a pathway that was associated with the four traits in common, a SEM was fit to model a causal relationship among the genetic factors and traits. We applied our SEM model to GWAS data with four central obesity related traits: suprailiac and subscapular measures for upper body fat, BMI, and hypertension. Study subjects were collected from two Korean cohort regions. After quality control, 327,872 SNPs for 8842 individuals were included in the analysis. After comparing two SEMs, we concluded that suprailiac and subscapular measures may indirectly affect hypertension susceptibility by influencing BMI. In conclusion, our analysis demonstrates that SEMs provide a better understanding of biological mechanisms by identifying greater numbers of genes and pathways.
Adipose Tissue
;
Cohort Studies
;
Humans
;
Hypertension
;
Obesity, Abdominal
;
Polymorphism, Single Nucleotide
;
Quality Control
;
Risk Factors
3.Editor's Introduction to This Issue.
Genomics & Informatics 2013;11(2):59-59
No abstract available.
4.Web-Based Database and Viewer of East Asian Copy Number Variations.
Ji Hong KIM ; Hae Jin HU ; Yeun Jun CHUNG
Genomics & Informatics 2012;10(1):65-67
We have discovered copy number variations (CNVs) in 3,578 Korean individuals with the Affymetrix Genome-Wide SNP array 5.0, and 4,003 copy number variation regions (CNVRs) were defined in a previous study. To explore the details of the variants easily in related studies, we built a database, cataloging the CNVs and related information. This system helps researchers browsing these variants with gene and structure variant annotations. Users can easily find specific regions with search options and verify them from system-integrated genome browsers with annotations.
Asian Continental Ancestry Group
;
Cataloging
;
Coat Protein Complex I
;
Genome
;
Humans
5.Comparative Evaluation of Intron Prediction Methods and Detection of Plant Genome Annotation Using Intron Length Distributions.
Genomics & Informatics 2012;10(1):58-64
Intron prediction is an important problem of the constantly updated genome annotation. Using two model plant (rice and Arabidopsis) genomes, we compared two well-known intron prediction tools: the Blast-Like Alignment Tool (BLAT) and Sim4cc. The results showed that each of the tools had its own advantages and disadvantages. BLAT predicted more than 99% introns of whole genomic introns with a small number of false-positive introns. Sim4cc was successful at finding the correct introns with a false-negative rate of 1.02% to 4.85%, and it needed a longer run time than BLAT. Further, we evaluated the intron information of 10 complete plant genomes. As non-coding sequences, intron lengths are not limited by a triplet codon frame; so, intron lengths have three phases: a multiple of three bases (3n), a multiple of three bases plus one (3n + 1), and a multiple of three bases plus two (3n + 2). It was widely accepted that the percentages of the 3n, 3n + 1, and 3n + 2 introns were quite similar in genomes. Our studies showed that 80% (8/10) of species were similar in terms of the number of three phases. The percentages of 3n introns in Ostreococcus lucimarinus was excessive (47.7%), while in Ostreococcus tauri, it was deficient (29.1%). This discrepancy could have been the result of errors in intron prediction. It is suggested that a three-phase evaluation is a fast and effective method of detecting intron annotation problems.
Codon
;
Genome
;
Genome, Plant
;
Humans
;
Introns
;
Plants
;
Triplets
6.An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases.
Md Rezaul KARIM ; Md Mamunur RASHID ; Byeong Soo JEONG ; Ho Jin CHOI
Genomics & Informatics 2012;10(1):51-57
Mining interesting patterns from DNA sequences is one of the most challenging tasks in bioinformatics and computational biology. Maximal contiguous frequent patterns are preferable for expressing the function and structure of DNA sequences and hence can capture the common data characteristics among related sequences. Biologists are interested in finding frequent orderly arrangements of motifs that are responsible for similar expression of a group of genes. In order to reduce mining time and complexity, however, most existing sequence mining algorithms either focus on finding short DNA sequences or require explicit specification of sequence lengths in advance. The challenge is to find longer sequences without specifying sequence lengths in advance. In this paper, we propose an efficient approach to mining maximal contiguous frequent patterns from large DNA sequence datasets. The experimental results show that our proposed approach is memory-efficient and mines maximal contiguous frequent patterns within a reasonable time.
Base Sequence
;
Computational Biology
;
Databases, Nucleic Acid
;
DNA
;
Mining
7.Efficient Mining of Interesting Patterns in Large Biological Sequences.
Md Mamunur RASHID ; Md Rezaul KARIM ; Byeong Soo JEONG ; Ho Jin CHOI
Genomics & Informatics 2012;10(1):44-50
Pattern discovery in biological sequences (e.g., DNA sequences) is one of the most challenging tasks in computational biology and bioinformatics. So far, in most approaches, the number of occurrences is a major measure of determining whether a pattern is interesting or not. In computational biology, however, a pattern that is not frequent may still be considered very informative if its actual support frequency exceeds the prior expectation by a large margin. In this paper, we propose a new interesting measure that can provide meaningful biological information. We also propose an efficient index-based method for mining such interesting patterns. Experimental results show that our approach can find interesting patterns within an acceptable computation time.
Base Sequence
;
Computational Biology
;
DNA
;
Mining
8.Decreases in Casz1 mRNA by an siRNA Complex Do not Alter Blood Pressure in Mice.
Su Min JI ; Young Bin SHIN ; So Yon PARK ; Hyeon Ju LEE ; Bermseok OH
Genomics & Informatics 2012;10(1):40-43
Recent genomewide association studies of large samples have identified genes that are associated with blood pressure. The Global Blood Pressure Genetics (Global BPgen) and Cohorts for Heart and Aging Research in Genome Epidemiology (CHARGE) consortiums identified 14 loci that govern blood pressure on a genomewide significance level, one of which is CASZ1 confirmed in both Europeans and Asians. CASZ1 is a zinc finger transcription factor that controls apoptosis and cell fate and suppresses neuroblastoma tumor growth by reprogramming gene expression, like a tumor suppressor. To validate the function of CASZ1 in blood pressure, we decreased Casz1 mRNA levels in mice by siRNA. Casz1 siRNA reduced mRNA levels by 59% in a mouse cell line. A polyethylenimine-mixed siRNA complex was injected into mouse tail veins, reducing Casz1 mRNA expression to 45% in the kidney. However, blood pressure in the treated mice was unaffected, despite a 55% reduction in Casz1 mRNA levels in the kidney on multiple siRNA injections daily. Even though Casz1 siRNA-treated mice did not experience any significant change in blood pressure, our study demonstrates the value of in vivo siRNA injection in analyzing the function of candidate genes identified by genomewide association studies.
Aging
;
Animals
;
Apoptosis
;
Asian Continental Ancestry Group
;
Blood Pressure
;
Cell Line
;
Cohort Studies
;
Gene Expression
;
Genome
;
Heart
;
Humans
;
Kidney
;
Mice
;
Neuroblastoma
;
RNA, Messenger
;
RNA, Small Interfering
;
Transcription Factors
;
Veins
;
Zinc Fingers
9.CaGe: A Web-Based Cancer Gene Annotation System for Cancer Genomics.
Young Kyu PARK ; Tae Wook KANG ; Su Jin BAEK ; Kwon Il KIM ; Seon Young KIM ; Doheon LEE ; Yong Sung KIM
Genomics & Informatics 2012;10(1):33-39
High-throughput genomic technologies (HGTs), including next-generation DNA sequencing (NGS), microarray, and serial analysis of gene expression (SAGE), have become effective experimental tools for cancer genomics to identify cancer-associated somatic genomic alterations and genes. The main hurdle in cancer genomics is to identify the real causative mutations or genes out of many candidates from an HGT-based cancer genomic analysis. One useful approach is to refer to known cancer genes and associated information. The list of known cancer genes can be used to determine candidates of cancer driver mutations, while cancer gene-related information, including gene expression, protein-protein interaction, and pathways, can be useful for scoring novel candidates. Some cancer gene or mutation databases exist for this purpose, but few specialized tools exist for an automated analysis of a long gene list from an HGT-based cancer genomic analysis. This report presents a new web-accessible bioinformatic tool, called CaGe, a cancer genome annotation system for the assessment of candidates of cancer genes from HGT-based cancer genomics. The tool provides users with information on cancer-related genes, mutations, pathways, and associated annotations through annotation and browsing functions. With this tool, researchers can classify their candidate genes from cancer genome studies into either previously reported or novel categories of cancer genes and gain insight into underlying carcinogenic mechanisms through a pathway analysis. We show the usefulness of CaGe by assessing its performance in annotating somatic mutations from a published small cell lung cancer study.
Gene Expression
;
Genes, Neoplasm
;
Genome
;
Genomics
;
Sequence Analysis, DNA
;
Small Cell Lung Carcinoma
10.Possibility of the Use of Public Microarray Database for Identifying Significant Genes Associated with Oral Squamous Cell Carcinoma.
Genomics & Informatics 2012;10(1):23-32
There are lots of studies attempting to identify the expression changes in oral squamous cell carcinoma. Most studies include insufficient samples to apply statistical methods for detecting significant gene sets. This study combined two small microarray datasets from a public database and identified significant genes associated with the progress of oral squamous cell carcinoma. There were different expression scales between the two datasets, even though these datasets were generated under the same platforms - Affymetrix U133A gene chips. We discretized gene expressions of the two datasets by adjusting the differences between the datasets for detecting the more reliable information. From the combination of the two datasets, we detected 51 significant genes that were upregulated in oral squamous cell carcinoma. Most of them were published in previous studies as cancer-related genes. From these selected genes, significant genetic pathways associated with expression changes were identified. By combining several datasets from the public database, sufficient samples can be obtained for detecting reliable information. Most of the selected genes were known as cancer-related genes, including oral squamous cell carcinoma. Several unknown genes can be biologically evaluated in further studies.
Carcinoma, Squamous Cell
;
Gene Expression
;
Oligonucleotide Array Sequence Analysis
;
Weights and Measures