1.The AB05 NIAB Tools Workbench for Building Automatic Biopathway Maps for Agricultural Organisms.
Mi Kyung CHO ; Kyung Oh YOON ; Hyun Seok PARK
Genomics & Informatics 2007;5(4):200-202
For the past several years, we have built various tools for automatic construction of biopathways to help biological experts, especially in the field of agriculture. We integrated several systems for constructing web applications for analyzing biological pathway information for agricultural species, constructing optimized pathway maps. In addition to building web applications for agricultural pathway information, we developed several stand-alone software tools, which are publicly downloadable under proper license agreements.
Agriculture
;
Computational Biology
;
Licensure
2.Biological Object Downloader (BOD) Service for Easy Download and Management of Biological Databases.
Daeui PARK ; Jungwoo LEE ; Giseok YOON ; Sungsam GONG ; Jong BHAK
Genomics & Informatics 2007;5(4):196-199
BOD is an FTP service management tool on the Internet. It was developed for biological researchers in South Korea. It enables easier and faster access of bioinformation without having to go through foreign FTP sites. BOD includes an automatic downloader with a management and email alert service from which the user can easily select and schedule any biological database. Once listed in BOD, the user can check and modify the download status and data from an additional email alert service.
Appointments and Schedules
;
Electronic Mail
;
Internet
;
Korea
3.RGISS: Rice (Oryza sativa L. ssp. japonica) Genome Information Service System.
Daesang LEE ; Hwajung SEO ; Jang Ho HAHN ; Eun Bae KONG ; Kiejung PARK
Genomics & Informatics 2007;5(4):194-195
We have constructed the Rice Genome Information Service System (RGISS), which is an information service system of the Oryza sativa L. ssp. japonica (rice) genome, using the released version of rice Build 3.0 pseudomolecules based on the Ensembl architecture. The nonredundant library, composed of 3,360 clones of BACs, PACs, and fosmids, was used to construct supercontigs. RGISS contains 50,717 annotated genes from GenBank, 56,161 predicted genes from FgeneSH, and information on 9,587 markers, which includes STS, SSR, and EST-based RFLP. The 20,180 ESTs sequenced by the Korea National Institute of Agricultural Biotechnology (NIAB) were aligned and mapped into 168,792 exons. By gene ontology analysis, the classified protein numbers in the rice genome were 6158, 4531, and 12,364 proteins, which were mapped to molecular function, cellular component, and biological process, respectively.
Biological Processes
;
Biotechnology
;
Clone Cells
;
Databases, Nucleic Acid
;
Exons
;
Expressed Sequence Tags
;
Gene Ontology
;
Genome*
;
Information Services*
;
Korea
;
Polymorphism, Restriction Fragment Length
;
Oryza
4.FESD II: A Revised Functional Element SNP Database of Human Ethnicities.
Hyun Ju KIM ; Il Hyun KIM ; Ki Hoon SHIN ; Young Kyu PARK ; Hyojin KANG ; Young Joo KIM
Genomics & Informatics 2007;5(4):188-193
The Functional Element SNPs Database (FESD) categorizes functional elements in human genic regions and provides a set of single nucleotide polymorphisms (SNPs) located within each area. Users may select a set of SNPs in specific functional elements with haplotype information and obtain flanking sequences for genotyping. Our previous version of FESD has been improved in several ways. We regenerated all the data in FESD II from recently updated source data such as HapMap, UCSC GoldenPath, dbSNP, OMIM, and TRANSFAC(R). Users can obtain information about tagSNPs and simulate LD blocks for each gene from four ethnicities in the HapMap project on the fly. FESD II employs a Java/JSP web interface for better platform portability and higher speed than PHP in the previous version. As a result, FESD II provides its users with more powerful information about functional element SNPs of human ethnicities.
Databases, Genetic
;
Diptera
;
Haplotypes
;
HapMap Project
;
Humans*
;
Polymorphism, Single Nucleotide
5.REPEATOME: A Database for Repeat Element Comparative Analysis in Human and Chimpanzee.
Taeha WOO ; Tae Hui HONG ; Sang Soo KIM ; Won Hyong CHUNG ; Hyo Jin KANG ; Chang Bae KIM ; Jungmin SEO
Genomics & Informatics 2007;5(4):179-187
An increasing number of primate genomes are being sequenced. A direct comparison of repeat elements in human genes and their corresponding chimpanzee orthologs will not only give information on their evolution, but also shed light on the major evolutionary events that shaped our species. We have developed REPEATOME to enable visualization and subsequent comparisons of human and chimpanzee repeat elements. REPEATOME (http://www.repeatome.org/) provides easy access to a complete repeat element map of the human genome, as well as repeat element-associated information. It provides a convenient and effective way to access the repeat elements within or spanning the functional regions in human and chimpanzee genome sequences. REPEATOME includes information to compare repeat elements and gene structures of human genes and their counterparts in chimpanzee. This database can be accessed using comparative search options such as intersection, union, and difference to find lineage-specific or common repeat elements. REPEATOME allows researchers to perform visualization and comparative analysis of repeat elements in human and chimpanzee.
Genome
;
Genome, Human
;
Humans*
;
Pan troglodytes*
;
Primates
6.Conservation of cis-Regulatory Element Controlling Timely Translation in the 3'-UTR of Selected Mammalian Maternal Transcripts.
Hyun Joo LEE ; Yoonki LIM ; Sang Ho CHANG ; Kwansik MIN ; Ching Tack HAN ; Sue Yun HWANG
Genomics & Informatics 2007;5(4):174-178
The earliest stages of mammalian embryogenesis are governed by the activity of maternally inherited transcripts and proteins. Cytoplasmic polyadenylation of selected maternal mRNA has been reported to be a major control mechanism of delayed translation during preimplantation embryogenesis in mice. The presence of cis-elements required for cytoplasmic polyadenylation (e.g., CPE) can serve as a useful tag in the screening of maternal genes partaking in key functions in the transcriptionally dormant egg and early embryo. However, due to its relative simplicity, UA-rich sequences satisfying the canonical rule of known CPE consensus sequences are often found in the 3'-UTR of maternal transcripts that do not actually undergo cytoplasmic polyadenylation. In this study, we developed a method to confirm the validity of candidate CPE sequences in a given gene by a multiplex comparison of 3'-UTR sequences between mammalian homologs. We found that genes undergoing cytoplasmic polyadenylation tend to create a conserved block around the CPE, while CPE-like sequences in the 3'-UTR of genes lacking cytoplasmic polyadenylation do not exhibit such conservation between species. Through this cross-species comparison, we also identified an alternative CPE in the 3'-UTR of tissue-type plasminogen activator (tPA), which is more likely to serve as a functional element. We suggest that verification of CPEs based on sequence conservation can provide a convenient tool for mass screening of factors governing the earliest processes of mammalian embryogenesis.
Animals
;
Consensus Sequence
;
Cytoplasm
;
Embryonic Development
;
Embryonic Structures
;
Female
;
Mass Screening
;
Mice
;
Ovum
;
Polyadenylation
;
Pregnancy
;
RNA, Messenger, Stored
;
Tissue Plasminogen Activator
7.Application of Random Forests to Association Studies Using Mitochondrial Single Nucleotide Polymorphisms.
Genomics & Informatics 2007;5(4):168-173
In previous nuclear genomic association studies, Random Forests (RF), one of several up-to-date machine learning methods, has been used successfully to generate evidence of association of genetic polymorphisms with diseases or other phenotypes. Compared with traditional statistical analytic methods, such as chi-square tests or logistic regression models, the RF method has advantages in handling large numbers of predictor variables and examining gene-gene interactions without a specific model. Here, we applied the RF method to find the association between mitochondrial single nucleotide polymorphisms (mtSNPs) and diabetes risk. The results from a chi-square test validated the usage of RF for association studies using mtDNA. Indexes of important variables such as the Gini index and mean decrease in accuracy index performed well compared with chi-square tests in favor of finding mtSNPs associated with a real disease example, type 2 diabetes.
DNA, Mitochondrial
;
Logistic Models
;
Phenotype
;
Polymorphism, Genetic
;
Polymorphism, Single Nucleotide*
;
Machine Learning
8.Genetic Polymorphisms of UGT1A and their Association with Clinical Factors in Healthy Koreans.
Jeong Oh KIM ; Jeong Young SHIN ; Myung Ah LEE ; Hyun Suk CHAE ; Chul Ho LEE ; Jae Sook ROH ; Sun Kyung JIN ; Tae Sun KANG ; Jung Ran CHOI ; Jin Hyoung KANG
Genomics & Informatics 2007;5(4):161-167
Glucuronidation by the uridine diphosphateglucuronosyltransferase 1A enzymes (UGT1As) is a major pathway for elimination of particular drugs and endogenous substances, such as bilirubin. We examined the relation of eight single nucleotide polymorphisms (SNPs) and haplotypes of the UGT1A gene with their clinical factors. For association analysis, we genotyped the variants by direct sequencing analysis and polymerase chain reaction (PCR) in 218 healthy Koreans. The frequency of UGT1A1 polymorphisms, -3279T>G, -3156G>A, -53 (TA)(6>7), 211G>A, and 686C>A, was 0.26, 0.12, 0.08, 0.15, and 0.01, respectively. The frequency of -118 (T)9>10 of UGT1A9 was 0.62, which was significantly higher than that in Caucasians (0.39). Neither the -2152C>T nor the -275T>A polymorphism was observed in Koreans or other Asians in comparison with Caucasians. The -3156G>A and -53 (TA)6>7 polymorphisms of UGT1A were significantly associated with platelet count and total bilirubin level (p=0.01, p=0.01, respectively). Additionally, total bilirubin level was positively correlated with occurrence of the UGT1A9-118 (T)(9>10) rare variant. Common haplotypes encompassing six UGT1A polymorphisms were significantly associated with total bilirubin level (p=0.01). Taken together, we suggest that determination of the UGT1A1 and UGT1A9 genotypes is clinically useful for predicting the efficacy and serious toxicities of particular drugs requiring glucuronidation.
Asian Continental Ancestry Group
;
Bilirubin
;
Genotype
;
Haplotypes
;
Humans
;
Platelet Count
;
Polymerase Chain Reaction
;
Polymorphism, Genetic*
;
Polymorphism, Single Nucleotide
;
Uridine
9.A Simple and Fast Web Alignment Tool for Large Amount of Sequence Data.
Genomics & Informatics 2008;6(3):157-159
Multiple sequence alignment (MSA) is the most important step for many of biological sequence analyses, homology search, and protein structural assignments. However, large amount of data make biologists difficult to perform MSA analyses and it requires much computational time to align many sequences. Here, we have developed a simple and fast web alignment tool for aligning, editing, and visualizing large amount of sequence data. We used a cluster server installed ClustalW-MPI using web services and message passing interface (MPI). It also enables users to edit multiple sequence alignments for manual editing and to download the input data and results such as alignments and phylogenetic tree.
Sequence Alignment
;
Sequence Analysis
10.StrokeBase: A Database of Cerebrovascular Disease-related Candidate Genes.
Young Uk KIM ; Il Hyun KIM ; Ok Sun BANG ; Young Joo KIM
Genomics & Informatics 2008;6(3):153-156
Complex diseases such as stroke and cancer have two or more genetic loci and are affected by environmental factors that contribute to the diseases. Due to the complex characteristics of these diseases, identifying candidate genes requires a system-level analysis of the following: gene ontology, pathway, and interactions. A database and user interface, termed StrokeBase, was developed; StrokeBase provides queries that search for pathways, candidate genes, candidate SNPs, and gene networks. The database was developed by using in silico data mining of HGNC, ENSEMBL, STRING, RefSeq, UCSC, GO, HPRD, KEGG, GAD, and OMIM. Forty candidate genes that are associated with cerebrovascular disease were selected by human experts and public databases. The networked cerebrovascular disease gene maps also were developed; these maps describe genegene interactions and biological pathways. We identified 1127 genes, related indirectly to cerebrovascular disease but directly to the etiology of cerebrovascular disease. We found that a protein-protein interaction (PPI) network that was associated with cerebrovascular disease follows the power-law degree distribution that is evident in other biological networks. Not only was in silico data mining utilized, but also 250K Affymetrix SNP chips were utilized in the 320 control/disease association study to generate associated markers that were pertinent to the cerebrovascular disease as a genome- wide search. The associated genes and the genes that were retrieved from the in silico data mining system were compared and analyzed. We developed a well-curated cerebrovascular disease-associated gene network and provided bioinformatic resources to cerebrovascular disease researchers. This cerebrovascular disease network can be used as a frame of systematic genomic research, applicable to other complex diseases. Therefore, the ongoing database efficiently supports medical and genetic research in order to overcome cerebrovascular disease.
Computer Simulation
;
Data Mining
;
Databases, Genetic
;
Gene Regulatory Networks
;
Genes, rel
;
Genetic Loci
;
Genetic Research
;
Humans
;
Polymorphism, Single Nucleotide
;
Stroke