1.Review on the research progress of mining of OMIM data.
Jianhua LI ; Zheren LI ; Yan KANG ; Ling LI
Journal of Biomedical Engineering 2014;31(6):1400-1404
Online Mendelian Inheritance in Man (OMIM) is a knowledge source and data base for human genetic diseases and related genes. Each OMIM entry includes clinical synopsis, linkage analysis for candidate genes, chromosomal localization and animal models, which has become an authoritative source of information for the study of the relationship between genes and diseases. As overlap of disease symptoms may reflect interactions at the molecular level, comparison of phenotypic similarity may indicate candidate genes and help to discover functional connections between genes and proteins. However, the OMIM has used free text to describe disease phenotypes, which does not suit computer analysis. Standardization of OMIM data therefore has important implications for large-scale comparison of disease phenotypes and prediction of phenotype-genotype correlations. Recently, standard medical language systems, term frequency-inverse document frequency and the law of cosines for document classification have been introduced for mining of OMIM data. Combined with Gene Ontology and various comparison methods, this has achieved substantial successes. In this article, we have reviewed various methods for standardization and similarity comparison of OMIM data. We also predicted the trend for research in this direction.
Databases, Genetic
;
Humans
;
Phenotype
2.Application of Online Mendelian Inheritance in Man to medical genetics.
National Journal of Andrology 2011;17(7):639-643
Online Mendelian Inheritance in Man (OMIM, http://omim. org/) is a comprehensive, authoritative, practical and timely knowledgebase of human genes and genetic disorders. OMIM, as a genetic encyclopedia, provides an easy and straightforward access to information on human genetics to students, researchers and clinicians. This article presents an overview on the contents of OMIM and its application to medical genetics.
Databases, Factual
;
Databases, Genetic
;
Genetics, Medical
;
Humans
3.D2GSNP: a web server for the selection of Single Nucleotide Polymorphisms within human disease genes.
Hyo Jin KANG ; Tae Hui HONG ; Won Hyong CHUNG ; Young Uk KIM ; Jin Hee JUNG ; So Hyun HWANG ; A Reum HAN ; Young Joo KIM
Genomics & Informatics 2006;4(1):45-47
D2GSNP is a web-based server for the selection of single nucleotide polymorph isms (SNPs) within genes related to human diseases. The D2GSNP is based on a relational database created by downloading and parsing OMIM, GAD, and dbSNP, and merging it with positional information of UCSC Golden Path. Totally our server provides 5,142 and 1,932 non-redundant disease genes from OMIM and GAD, respectively. With the D2GSNP web interface, users can select SNPs within genes responding to certain diseases and get their flanking sequences for further genotyping experiments such as association studies.
Databases, Genetic
;
Humans*
;
Polymorphism, Single Nucleotide*
4.Introduction of Bioinformatic Methods for the Gene Function Analysis.
The Korean Journal of Hepatology 2004;10(1):11-21
No abstract available.
Computational Biology
;
*Databases, Genetic
;
*Genome, Human
;
Humans
5.MediScore: MEDLINE-based Interactive Scoring of Gene and Disease Associations.
Hye Young CHO ; Bermseok OH ; Jong Keuk LEE ; Kuchan KIMM ; InSong KOH
Genomics & Informatics 2004;2(3):131-133
MediScore is an information retrieval system, which helps to search for the set of genes associated with a specific disease or the set of diseases associated with a specific gene. Despite recent improvement of natural language processing (NLP) and other text mining approaches to search for disease associated genes, many false positive results come out due to diversity of exceptional cases as well as ambiguities in gene names. In order to overcome the weak points of current text mining approaches, MediScore introduces statistical normalization based on binomial to normal distribution approximation which corrects inaccurate scores caused by common words not representing genes and interactive rescoring by the user to remove the false positive results. Interactive rescoring includes individual alias scoring for each gene to remove false gene synonyms, referring MEDLINE abstracts, and cross referencing between OMIM and other related information.
Data Mining
;
Databases, Genetic
;
Information Systems
;
Natural Language Processing
6.Construction of EST Database for Comparative Gene Studies of Acanthamoeba.
Eun Kyung MOON ; Joung Ok KIM ; Ying Hua XUAN ; Young Sun YUN ; Se Won KANG ; Yong Seok LEE ; Tae In AHN ; Yeon Chul HONG ; Dong Il CHUNG ; Hyun Hee KONG
The Korean Journal of Parasitology 2009;47(2):103-107
The genus Acanthamoeba can cause severe infections such as granulomatous amebic encephalitis and amebic keratitis in humans. However, little genomic information of Acanthamoeba has been reported. Here, we constructed Acanthamoeba expressed sequence tags (EST) database (Acanthamoeba EST DB) derived from our 4 kinds of Acanthamoeba cDNA library. The Acanthamoeba EST DB contains 3,897 EST generated from amebae under various conditions of long term in vitro culture, mouse brain passage, or encystation, and downloaded data of Acanthamoeba from National Center for Biotechnology Information (NCBI) and Taxonomically Broad EST Database (TBestDB). The almost reported cDNA/genomic sequences of Acanthamoeba provide stand alone BLAST system with nucleotide (BLAST NT) and amino acid (BLAST AA) sequence database. In BLAST results, each gene links for the significant information including sequence data, gene orthology annotations, relevant references, and a BlastX result. This is the first attempt for construction of Acanthamoeba database with genes expressed in diverse conditions. These data were integrated into a database (http://www.amoeba.or.kr).
Acanthamoeba/*genetics
;
Animals
;
*Databases, Genetic
;
*Expressed Sequence Tags
7.FESD II: A Revised Functional Element SNP Database of Human Ethnicities.
Hyun Ju KIM ; Il Hyun KIM ; Ki Hoon SHIN ; Young Kyu PARK ; Hyojin KANG ; Young Joo KIM
Genomics & Informatics 2007;5(4):188-193
The Functional Element SNPs Database (FESD) categorizes functional elements in human genic regions and provides a set of single nucleotide polymorphisms (SNPs) located within each area. Users may select a set of SNPs in specific functional elements with haplotype information and obtain flanking sequences for genotyping. Our previous version of FESD has been improved in several ways. We regenerated all the data in FESD II from recently updated source data such as HapMap, UCSC GoldenPath, dbSNP, OMIM, and TRANSFAC(R). Users can obtain information about tagSNPs and simulate LD blocks for each gene from four ethnicities in the HapMap project on the fly. FESD II employs a Java/JSP web interface for better platform portability and higher speed than PHP in the previous version. As a result, FESD II provides its users with more powerful information about functional element SNPs of human ethnicities.
Databases, Genetic
;
Diptera
;
Haplotypes
;
HapMap Project
;
Humans*
;
Polymorphism, Single Nucleotide
8.The OAuth 2.0 Web Authorization Protocol for the Internet Addiction Bioinformatics (IABio) Database.
Jeongseok CHOI ; Jaekwon KIM ; Dong Kyun LEE ; Kwang Soo JANG ; Dai Jin KIM ; In Young CHOI
Genomics & Informatics 2016;14(1):20-28
Internet addiction (IA) has become a widespread and problematic phenomenon as smart devices pervade society. Moreover, internet gaming disorder leads to increases in social expenditures for both individuals and nations alike. Although the prevention and treatment of IA are getting more important, the diagnosis of IA remains problematic. Understanding the neurobiological mechanism of behavioral addictions is essential for the development of specific and effective treatments. Although there are many databases related to other addictions, a database for IA has not been developed yet. In addition, bioinformatics databases, especially genetic databases, require a high level of security and should be designed based on medical information standards. In this respect, our study proposes the OAuth standard protocol for database access authorization. The proposed IA Bioinformatics (IABio) database system is based on internet user authentication, which is a guideline for medical information standards, and uses OAuth 2.0 for access control technology. This study designed and developed the system requirements and configuration. The OAuth 2.0 protocol is expected to establish the security of personal medical information and be applied to genomic research on IA.
Computational Biology*
;
Databases, Genetic
;
Diagnosis
;
Health Expenditures
;
Humans
;
Internet*
9.Unique Phylogenetic Lineage Found in the Fusarium-like Clade after Re-examining BCCM/IHEM Fungal Culture Collection Material.
David TRIEST ; Koen DE CREMER ; Denis PIÉRARD ; Marijke HENDRICKX
Mycobiology 2016;44(3):121-130
Recently, the Fusarium genus has been narrowed based upon phylogenetic analyses and a Fusarium-like clade was adopted. The few species of the Fusarium-like clade were moved to new, re-installed or existing genera or provisionally retained as "Fusarium." Only a limited number of reference strains and DNA marker sequences are available for this clade and not much is known about its actual species diversity. Here, we report six strains, preserved by the Belgian fungal culture collection BCCM/IHEM as a Fusarium species, that belong to the Fusarium-like clade. They showed a slow growth and produced pionnotes, typical morphological characteristics of many Fusarium-like species. Multilocus sequencing with comparative sequence analyses in GenBank and phylogenetic analyses, using reference sequences of type material, confirmed that they were indeed member of the Fusarium-like clade. One strain was identified as "Fusarium" ciliatum whereas another strain was identified as Fusicolla merismoides. The four remaining strains were shown to represent a unique phylogenetic lineage in the Fusarium-like clade and were also found morphologically distinct from other members of the Fusarium-like clade. Based upon phylogenetic considerations, a new genus, Pseudofusicolla gen. nov., and a new species, Pseudofusicolla belgica sp. nov., were installed for this lineage. A formal description is provided in this study. Additional sampling will be required to gather isolates other than the historical strains presented in the present study as well as to further reveal the actual species diversity in the Fusarium-like clade.
Databases, Nucleic Acid
;
Fusarium
;
Genetic Markers
;
Phylogeny
;
Sequence Analysis
10.CloudLCA: finding the lowest common ancestor in metagenome analysis using cloud computing.
Guoguang ZHAO ; Dechao BU ; Changning LIU ; Jing LI ; Jian YANG ; Zhiyong LIU ; Yi ZHAO ; Runsheng CHEN
Protein & Cell 2012;3(2):148-152
Estimating taxonomic content constitutes a key problem in metagenomic sequencing data analysis. However, extracting such content from high-throughput data of next-generation sequencing is very time-consuming with the currently available software. Here, we present CloudLCA, a parallel LCA algorithm that significantly improves the efficiency of determining taxonomic composition in metagenomic data analysis. Results show that CloudLCA (1) has a running time nearly linear with the increase of dataset magnitude, (2) displays linear speedup as the number of processors grows, especially for large datasets, and (3) reaches a speed of nearly 215 million reads each minute on a cluster with ten thin nodes. In comparison with MEGAN, a well-known metagenome analyzer, the speed of CloudLCA is up to 5 more times faster, and its peak memory usage is approximately 18.5% that of MEGAN, running on a fat node. CloudLCA can be run on one multiprocessor node or a cluster. It is expected to be part of MEGAN to accelerate analyzing reads, with the same output generated as MEGAN, which can be import into MEGAN in a direct way to finish the following analysis. Moreover, CloudLCA is a universal solution for finding the lowest common ancestor, and it can be applied in other fields requiring an LCA algorithm.
Algorithms
;
Databases, Genetic
;
Metagenomics
;
Search Engine
;
User-Computer Interface