1.Identification of Caenorhabditis elegans MicroRNA Targets Using a Kernel Method.
Wha Jin LEE ; Jin Wu NAM ; Sung Kyu KIM ; Byoung Tak ZHANG
Genomics & Informatics 2005;3(1):15-23
BACKGROUND: MicroRNAs (miRNAs)are a class of noncoding RNAs found in various organisms such as plants and mammals. However, most of the mRNAs regulated by miRNAs are unknown. Furthermore, miRNA targets in genomes cannot be identified by standard sequence comparison since their complementarity to the target sequence is imperfect in general. In thi s paper, we propose a kernel-based method for the efficient prediction of miRNA targets. To help in distinguishing the false positives from potentially valid targets, we elucidate the features common in experimentally confirmed targets. RESULTS: The performance of our prediction method was evaluated by five-fold cross-validation. Our method showed 0.64 and 0.98 in sensitivity and in specificity, respectively. Also, the proposed method reduced the number of false positives by half compared with TargetScan. We investigated the effect of feature sets on the classification of miRNA targets. Finally, we predicted miRNA targets for several miRNAs in the Caenorhabditis elegans (C.elegans )3'untranslated region (3'UTR) database. CONCLUSIONS: The targets predicted by the suggested method will help in validating more miRNA targets and ultimately in revealing the role of small RNAs in the regulation of genomes. Our algorithm for miRNA target site detection will be able to be improved by additional experimental-knowledge. Also, the increase of the number of confirmed targets is expected to reveal general structural features that can be used to improve their detection.
Caenorhabditis elegans*
;
Caenorhabditis*
;
Classification
;
Genome
;
Mammals
;
MicroRNAs*
;
RNA
;
RNA, Messenger
;
RNA, Untranslated
;
Sensitivity and Specificity
2.Gene Expression Pattern Analysis via Latent Variable Models Coupled with Topographic Clustering.
Jeong Ho CHANG ; Sung Wook CHI ; Byoung Tak ZHANG
Genomics & Informatics 2003;1(1):32-39
We present a latent variable model-based approach to the analysis of gene expression patterns, coupled with topographic clustering. Aspect model, a latent variable model for dyadic data, is applied to extract latent patterns underlying complex variations of gene expression levels. Then a topographic clustering is performed to find coherent groups of genes, based on the extracted latent patterns as well as individual gene expression behaviors. Applied to cell cycle regulated genes of the yeast Saccharomyces cerevisiae, the proposed method could discover biologically meaningful patterns related with characteristic expression behavior in particular cell cycle phases. In addition, the display of the variation in the composition of these latent patterns on the cluster map provided more facilitated interpretation of the resulting cluster structure. From this, we argue that latent variable models, coupled with topographic clustering, are a promising tool for explorative analysis of gene expression data.
Cell Cycle
;
Cluster Analysis*
;
Gene Expression Profiling*
;
Gene Expression*
;
Saccharomyces cerevisiae
;
Yeasts
3.PubMiner: Machine Learning-based Text Mining for Biomedical Information Analysis.
Jae Hong EOM ; Byoung Tak ZHANG
Genomics & Informatics 2004;2(2):99-106
In this paper we introduce PubMiner, an intelligent machine learning based text mining system for mining biological information from the literature. PubMiner employs natural language processing techniques and machine learning based data mining techniques for mining useful biological information such as protein-protein interaction from the massive literature. The system recognizes biological terms such as gene, protein, and enzymes and extracts their interactions described in the document through natural language processing. The extracted interactions are further analyzed with a set of features of each entity that were collected from the related public databases to infer more interactions from the original interactions. An inferred interaction from the interaction analysis and native interaction are provided to the user with the link of literature sources. The performance of entity and interaction extraction was tested with selected MEDLINE abstracts. The evaluation of inference proceeded using the protein interaction data of S. cerevisiae (bakers yeast) from MIPS and SGD.
Data Mining*
;
Mining
;
Natural Language Processing
;
Machine Learning
4.Classification of Human Papillomavirus (HPV) Risk Type via Text Mining.
Seong Bae PARK ; Sohyun HWANG ; Byoung Tak ZHANG
Genomics & Informatics 2003;1(2):80-86
Human Papillomavirus (HPV) infection is known as the main factor for cervical cancer which is a leading cause of cancer deaths in women worldwide. Because there are more than 100 types in HPV, it is critical to discriminate the HPVs related with cervical cancer from those not related with it. In this paper, the risk type of HPVs using their textual explanation. The important issue in this problem is to distinguish false negatives from false positives. That is, we must find high-risk HPVs as many as possible though we may miss some low-risk HPVs. For this purpose, the AdaCost, a cost-sensitive learner is adopted to consider different costs between training examples. The experimental results on the HPV sequence database show that the consideration of costs gives higher performance. The improvement in F-score is higher than that of the accuracy, which implies that the number of high-risk HPVs found is increased.
Classification*
;
Data Mining*
;
Female
;
Humans*
;
Uterine Cervical Neoplasms