1.ORF Miner: a Web-based ORF Search Tool.
Genomics & Informatics 2009;7(4):217-219
The primary clue for locating protein-coding regions is the open reading frame and the determination of ORFs (Open Reading Frames) is the first step toward the gene prediction, especially for prokaryotes. In this respect, we have developed a web-based ORF search tool called ORF Miner. The ORF Miner is a graphical analysis utility which determines all possible open reading frames of a selectable minimum size in an input sequence. This tool identifies all open reading frames using alternative genetic codes as well as the standard one and reports a list of ORFs with corresponding deduced amino acid sequences. The ORF Miner can be employed for sequence annotation and give a crucial clue to determination of actual protein-coding regions.
Amino Acid Sequence
;
Animals
;
Ecthyma, Contagious
;
Genetic Code
;
Open Reading Frames
;
Resin Cements
2.WebChemDB: An Integrated Chemical Database Retrieval System.
Bo Kyeng HOU ; Eun Joung MOON ; Sung Chul MOON ; Hae Jin KIM
Genomics & Informatics 2009;7(4):212-216
WebChemDB is an integrated chemical database retrieval system that provides access to over 8 million publicly available chemical structures, including related information on their biological activities and direct links to other public chemical resources, such as PubChem, ChEBI, and DrugBank. The data are publicly available over the web, using two-dimensional (2D) and three-dimensional (3D) structure retrieval systems with various filters and molecular descriptors. The web services API also provides researchers with functionalities to programmatically manipulate, search, and analyze the data.
Databases, Chemical
;
Subject Headings
3.Microarray Data Analysis of Perturbed Pathways in Breast Cancer Tissues.
Changsik KIM ; Jiwon CHOI ; Sukjoon YOON
Genomics & Informatics 2008;6(4):210-222
Due to the polygenic nature of cancer, it is believed that breast cancer is caused by the perturbation of multiple genes and their complex interactions, which contribute to the wide aspects of disease phenotypes. A systems biology approach for the identification of subnetworks of interconnected genes as functional modules is required to understand the complex nature of diseases such as breast cancer. In this study, we apply a 3-step strategy for the interpretation of microarray data, focusing on identifying significantly perturbed metabolic pathways rather than analyzing a large amount of overexpressed and underexpressed individual genes. The selected pathways are considered to be dysregulated functional modules that putatively contribute to the progression of disease. The subnetwork of protein-protein interactions for these dysregulated pathways are constructed for further detailed analysis. We evaluated the method by analyzing microarray datasets of breast cancer tissues; i.e., normal and invasive breast cancer tissues. Using the strategy of microarray analysis, we selected several significantly perturbed pathways that are implicated in the regulation of progression of breast cancers, including the extracellular matrix-receptor interaction pathway and the focal adhesion pathway. Moreover, these selected pathways include several known breast cancer-related genes. It is concluded from this study that the present strategy is capable of selecting interesting perturbed pathways that putatively play a role in the progression of breast cancer and provides an improved interpretability of networks of protein-protein interactions.
Breast
;
Breast Neoplasms
;
Focal Adhesions
;
Metabolic Networks and Pathways
;
Microarray Analysis
;
Phenotype
;
Statistics as Topic
;
Systems Biology
4.Biological Pathway Extension Using Microarray Gene Expression Data.
Tae Su CHUNG ; Jihun KIM ; Keewon KIM ; Ju Han KIM
Genomics & Informatics 2008;6(4):202-209
Biological pathways are known as collections of knowledge of certain biological processes. Although knowledge about a pathway is quite significant to further analysis, it covers only tiny portion of genes that exists. In this paper, we suggest a model to extend each individual pathway using a microarray expression data based on the known knowledge about the pathway. We take the Rosetta compendium dataset to extend pathways of Saccharomyces cerevisiae obtained from KEGG (Kyoto Encyclopedia of genes and genomes) database. Before applying our model, we verify the underlying assumption that microarray data reflect the interactive knowledge from pathway, and we evaluate our scoring system by introducing performance function. In the last step, we validate proposed candidates with the help of another type of biological information. We introduced a pathway extending model using its intrinsic structure and microarray expression data. The model provides the suitable candidate genes for each single biological pathway to extend it.
Biological Processes
;
Gene Expression
;
Saccharomyces cerevisiae
5.Erythropoietin-producing Human Hepatocellular Carcinoma Receptor B1 Polymorphisms are Associated with HBV-infected Chronic Liver Disease and Hepatocellular Carcinoma in a Korean Population.
Kyoung Yeon KIM ; Seung Ku LEE ; Min Ho KIM ; Jae Youn CHEONG ; Sung Won CHO ; Kap Seok YANG ; KyuBum KWACK
Genomics & Informatics 2008;6(4):192-201
Erythropoietin-producing human hepatocellular carcinoma receptor B1 (EPHB1) is a member of the Eph family of receptor tyrosine kinases that mediate vascular system development. Eph receptor overexpression has been observed in various cancers and is related to the malignant transformation, metastasis, and differentiation of cancers, including hepatocellular carcinoma (HCC). Eph receptors regulate cell migration and attachment to the extracellular matrix by modulating integrin activity. EphrinB1, the ligand of EPHB1, has been shown to regulate HCC carcinogenesis. Here, we sought to determine whether EPHB1 polymorphisms are associated with hepatitis B virus (HBV)-infected liver diseases, including chronic liver disease (CLD) and HCC. We genotyped 26 EPHB1 single nucleotide polymorphisms (SNPs) in 399 Korean CLD, HCC, and LD (CLD+HCC) cases and seroconverted controls (HBV clearance, CLE) using the GoldenGate assay. Two SNPs (rs6793828 and rs11717042) and 1 haplotype that were composed of these SNPs were associated with an increased risk for CLD, HCC, and LD (CLD+HCC) compared with CLE. Haplotypes that could be associated with HBV-infected liver diseases by affecting downstream signaling were located in the Eph tyrosine kinase domain of EPHB1. Therefore, we suggest that EPHB1 SNPs, haplotypes, and diplotypes may be genetic markers for the progression of HBV-associated acute hepatitis to CLD and HCC.
Carcinoma, Hepatocellular
;
Cell Movement
;
Extracellular Matrix
;
Genetic Markers
;
Haplotypes
;
Hepatitis
;
Hepatitis B virus
;
Humans
;
Liver
;
Liver Diseases
;
Neoplasm Metastasis
;
Phosphotransferases
;
Polymorphism, Single Nucleotide
;
Protein-Tyrosine Kinases
;
Receptor, EphA1
;
Receptors, Eph Family
;
Tyrosine
6.Polymorphisms in RAS Guanyl-releasing Protein 3 are Associated with Chronic Liver Disease and Hepatocellular Carcinoma in a Korean Population.
Ah Reum OH ; Seung Ku LEE ; Min Ho KIM ; Jae Youn CHEONG ; Sung Won CHO ; Kap Seok YANG ; KyuBum KWACK
Genomics & Informatics 2008;6(4):181-191
RAS guanyl-releasing protein 3 (RasGRP3), a member of the Ras subfamily of GTPases, functions as a guanosine triphosphate (GTP)/guanosine diphosphate (GDP)-regulated switch that cycles between inactive GDP- and active GTP-bound states during signal transduction. Various growth factors enhance hepatocellular carcinoma (HCC) proliferation via activation of the Ras/Raf-1/ extracellular signal-regulated kinase (ERK) pathway, which depends on RasGRP3 activation. We investigated the relationship between polymorphisms in RasGRP3 and progression of hepatitis B virus (HBV)-infected HCC in a Korean population. Nineteen RasGRP3 SNPs were genotyped in 206 patients with chronic liver disease (CLD) and 86 patients with HCC. Our results revealed that the T allele of the rs7597095 SNP and the C allele of the rs7592762 SNP increased susceptibility to HCC (OR=1.55, p=0.04 and OR=1.81~2.61, p=0.01~0.03, respectively). Moreover, patients who possessed the haplotype (ht) 1 ( A-T-C-G) or diplotype (dt) 1 ( ht1/ht1) variations had increased susceptibility to HCC (OR=1.79 ~2.78, p=0.01~0.03). In addition, we identified an association between haplotype1 (ht1) and the age of HCC onset; the age of HCC onset are earlier in ht1 +/+ than ht1 +/- or ht1 -/- (HR=0.42~0.66, p=0.006~0.015). Thus, our data suggest that RasGRP3 SNPs are significantly associated with an increased risk of developing HCC.
Alleles
;
Carcinoma, Hepatocellular
;
GTP Phosphohydrolases
;
Guanosine Triphosphate
;
Haplotypes
;
Hepatitis B virus
;
Humans
;
Intercellular Signaling Peptides and Proteins
;
Liver
;
Liver Diseases
;
Phospholipase C gamma
;
Phosphotransferases
;
Polymorphism, Single Nucleotide
;
Polyphosphates
;
Signal Transduction
7.Identification and Characterization of Human Genes Targeted by Natural Selection.
Ha Jung RYU ; Young Joo KIM ; Young Kyu PARK ; Jae Jung KIM ; Mi Young PARK ; Eul Ju SEO ; Han Wook YOO ; In Sook PARK ; Bermseok OH ; Jong Keuk LEE
Genomics & Informatics 2008;6(4):173-180
The human genome has evolved as a consequence of evolutionary forces, such as natural selection. In this study, we investigated natural selection on the human genes by comparing the numbers of nonsynonymous(NS) and synonymous (S) mutations in individual genes. We initially collected all coding SNP data of all human genes from the public dbSNP. Among the human genes, we selected 3 different selection groups of genes: positively selected genes (NS/S > or = 3), negatively selected genes (NS/S < or = 1/3) and neutral selection genes (0.9 < NS/S < 1.1). We characterized human genes targeted by natural selection. Negatively selected human genes were markedly associated with disease occurrence, but not positively selected genes. Interestingly, positively selected genes displayed an increase in potentially deleterious nonsynonymous SNPs with an increased frequency of tryptophan and tyrosine residues, suggesting a correlation with protective effects against human disease. Furthermore, our nonsynonymous/synonymous ratio data imply that specific human genes, such as ALMS1 and SPTBN5 genes, are differentially selected among distinct populations. We confirmed that inferences of natural selection using the NS/S ratio can be used extensively to identify functional genes selected during the evolutionary adaptation process.
Clinical Coding
;
Genome, Human
;
Humans
;
Polymorphism, Single Nucleotide
;
Selection, Genetic
;
Tryptophan
;
Tyrosine
8.In Silico Functional Assessment of Sequence Variations: Predicting Phenotypic Functions of Novel Variations.
Genomics & Informatics 2008;6(4):166-172
A multitude of protein-coding sequence variations (CVs) in the human genome have been revealed as a result of major initiatives, including the Human Variome Project, the 1000 Genomes Project, and the International Cancer Genome Consortium. This naturally has led to debate over how to accurately assess the functional consequences of CVs, because predicting the functional effects of CVs and their relevance to disease phenotypes is becoming increasingly important. This article surveys and compares variation databases and in silico prediction programs that assess the effects of CVs on protein function. We also introduce a combinatorial approach that uses machine learning algorithms to improve prediction performance.
Amino Acid Substitution
;
Computer Simulation
;
Genome
;
Genome, Human
;
Humans
;
Mutation, Missense
;
Phenotype
;
Machine Learning
9.Personal Genomics, Bioinformatics, and Variomics.
Jong BHAK ; Ho GHANG ; Rohit REJA ; Sangsoo KIM
Genomics & Informatics 2008;6(4):161-165
In 2008 at least five complete genome sequences are available. It is known that there are over 15,000,000 genetic variants, called SNPs, in the dbSNP database. The cost of full genome sequencing in 2009 is claimed to be less than $5000 USD. The genomics era has arrived in 2008. This review introduces technologies, bioinformatics, genomics visions, and variomics projects. Variomics is the study of the total genetic variation in an individual and populations. Research on genetic variation is the most valuable among many genomics research branches. Genomics and variomics projects will change biology and the society so dramatically that biology will become an everyday technology like personal computers and the internet. 'BioRevolution' is the term that can adequately describe this change.
Biology
;
Computational Biology
;
Genetic Variation
;
Genome
;
Genomics
;
Humans
;
Internet
;
Microcomputers
;
Polymorphism, Single Nucleotide
;
Vision, Ocular
10.CpG Islands Detector: a Window-based CpG Island Search Tool.
Genomics & Informatics 2010;8(1):58-61
CpG is the pair of nucleotides C and G, appearing successively, in this order, along one DNA strand. It is known that due to biochemical considerations CpG is relatively rare in most DNA sequences. However, in particular subsequences, which are a few hundred to a few thousand nucleotides long, the couple CpG is more frequent. These subsequences, called CpG islands, are known to appear in biologically more significant parts of the genome. The ability to identify CpG islands along a chromosome will therefore help us spot its more significant regions of interest, such as the promoters or 'start' regions of many genes. In this respect, I developed the CpG islands search tool, CpG Islands Detector, which was implemented in JAVA to be run on any platform. The window-based graphical user interface of CpG Islands Detector may facilitate the end user to employ this tool to pinpoint CpG islands in a genomic DNA sequence. In addition, this tool can be used to highlight potential genes in genomic sequences since CpG islands are very often found in the 5' regions of vertebrate genes.
Base Sequence
;
CpG Islands
;
DNA
;
Genome
;
Indonesia
;
Nucleotides
;
Vertebrates