1.Introduction to BLAH5 special issue: recent progress on interoperability of biomedical text mining
Jin Dong KIM ; Kevin Bretonnel COHEN ; Nigel COLLIER ; Zhiyong LU ; Fabio RINALDI
Genomics & Informatics 2019;17(2):e12-
No abstract available.
Data Mining
2.Use of Graph Database for the Integration of Heterogeneous Biological Data.
Byoung Ha YOON ; Seon Kyu KIM ; Seon Young KIM
Genomics & Informatics 2017;15(1):19-27
Understanding complex relationships among heterogeneous biological data is one of the fundamental goals in biology. In most cases, diverse biological data are stored in relational databases, such as MySQL and Oracle, which store data in multiple tables and then infer relationships by multiple-join statements. Recently, a new type of database, called the graph-based database, was developed to natively represent various kinds of complex relationships, and it is widely used among computer science communities and IT industries. Here, we demonstrate the feasibility of using a graph-based database for complex biological relationships by comparing the performance between MySQL and Neo4j, one of the most widely used graph databases. We collected various biological data (protein-protein interaction, drug-target, gene-disease, etc.) from several existing sources, removed duplicate and redundant data, and finally constructed a graph database containing 114,550 nodes and 82,674,321 relationships. When we tested the query execution performance of MySQL versus Neo4j, we found that Neo4j outperformed MySQL in all cases. While Neo4j exhibited a very fast response for various queries, MySQL exhibited latent or unfinished responses for complex queries with multiple-join statements. These results show that using graph-based databases, such as Neo4j, is an efficient way to store complex biological relationships. Moreover, querying a graph database in diverse ways has the potential to reveal novel relationships among heterogeneous biological data.
Biology
;
Data Mining
3.Variable Threshold based Feature Selection using Spatial Distribution of Data.
Chang Sik SON ; A Mi SHIN ; Young Dong LEE ; Hee Joon PARK ; Hyoung Seob PARK ; Yoon Nyun KIM
Journal of Korean Society of Medical Informatics 2009;15(4):475-481
OBJECTIVE: In processing high dimensional clinical data, choosing the optimal subset of features is important, not only for reduce the computational complexity but also to improve the value of the model constructed from the given data. This study proposes an efficient feature selection method with a variable threshold. METHODS: In the proposed method, the spatial distribution of labeled data, which has non-redundant attribute values in the overlapping regions, was used to evaluate the degree of intra-class separation, and the weighted average of the redundant attribute values were used to select the cut-off value of each feature. RESULTS: The effectiveness of the proposed method was demonstrated by comparing the experimental results for the dyspnea patients' dataset with 11 features selected from 55 features by clinical experts with those obtained using seven other classification methods. CONCLUSION: The proposed method can work well for clinical data mining and pattern classification applications.
Data Mining
;
Dyspnea
4.A descriptive study of the regional and time-point changes in the Filipinos' internet search for tooth decay and toothache
Junhel Dalanon ; Yoshizo Matsuka
Philippine Journal of Health Research and Development 2020;24(1):39-45
Background:
The Philippines has one of the highest prevalence of untreated tooth decay (TD) in the world. Toothache (TA) is a common sequela of chronic and untreated TD. Google Trends (GT) offers an inexpensive and fast method of assessing search trend for these health conditions.
Objectives:
This study aimed to characterize the regional and time-point variations in the Filipinos' internet searches for TD and TA.
Methods:
A descriptive analysis of a search query done on Google Trends using the search terms TD and TA was done. The parameters were constrained to include only data from the Philippines, from November 2009 to November 2019, under the health category, and the web search database.
Results:
The top three regions that had the highest searches for TA were MIMAROPA (100%), ARMM (100%), and Caraga (82%), while CAR (27%), Metro Manila (27%), and Ilocos Region had the highest search results for TD. From 2009 (19.85%) the searches for TA progressively increased until 2019 (92.61%), while the searches for TD remained comparable from 2009 (25.09%) to 2019 (25.98%).
Conclusion
The results of this study reveal regional and time-point differences in the Filipinos' search interests for TD and TA.
Toothache
;
Health Behavior
;
Data Mining
5.STAT3 as a candidate transcriptomic prognosticator of sepsis severity levels
Acta Medica Philippina 2023;57(3):34-41
Background:
Sepsis is a life-threatening multiple-organ dysfunction caused by a dysregulated host response to
infection and is the leading cause of death in non-cardiac intensive care facilities. Early reliable prediction of sepsis outcomes leads to cost-efficient resource allocation and therapeutic strategies. However, there are still no reliable markers to predict the outcome of patients at the initial stage of sepsis. Analyzing transcription profiles enables researchers to predict early outcomes using transcripts and their expression patterns. Transcriptomic profiling of septic patients has been done recently; however, analysis of prognostic outcomes is still scarce.
Objective:
This study aimed to determine transcriptional indicators that may be useful in the prognosis of the severity of sepsis.
Methods:
This is a prospective cohort study of Filipino patients admitted for sepsis at the national tertiary referral hospital in Manila, Philippines. We conducted differentially expressed gene analysis, network analyses, and area under the curve study of publicly available datasets of surviving vs. non-surviving sepsis patients to identify candidate prognosticator markers. Quantitative PCR was used to characterize the expression of each marker. A model using ordinal logistic regression analysis was done to determine which among the markers can best predict the outcome of sepsis severity.
Results:
We identified ACTB, RAC1, STAT3, and UBQLN1 as candidate mRNA prognosticators. The expression of STAT3, a gene involved in immunosuppression, is inversely correlated with the severity of sepsis.
Conclusion
Transcriptomic markers such as STAT3 can predict the severity of patients with sepsis. Early detection of its inverse expression may prompt early and more aggressive management of patients.
sepsis
;
STAT3
;
data mining
;
transcriptomics
6.Text Mining in Biomedical Domain with Emphasis on Document Clustering.
Healthcare Informatics Research 2017;23(3):141-146
OBJECTIVES: With the exponential increase in the number of articles published every year in the biomedical domain, there is a need to build automated systems to extract unknown information from the articles published. Text mining techniques enable the extraction of unknown knowledge from unstructured documents. METHODS: This paper reviews text mining processes in detail and the software tools available to carry out text mining. It also reviews the roles and applications of text mining in the biomedical domain. RESULTS: Text mining processes, such as search and retrieval of documents, pre-processing of documents, natural language processing, methods for text clustering, and methods for text classification are described in detail. CONCLUSIONS: Text mining techniques can facilitate the mining of vast amounts of knowledge on a given topic from published biomedical research articles and draw meaningful conclusions that are not possible otherwise.
Classification
;
Cluster Analysis*
;
Data Mining*
;
Mining
;
Natural Language Processing
7.PubMiner: Machine Learning-based Text Mining for Biomedical Information Analysis.
Jae Hong EOM ; Byoung Tak ZHANG
Genomics & Informatics 2004;2(2):99-106
In this paper we introduce PubMiner, an intelligent machine learning based text mining system for mining biological information from the literature. PubMiner employs natural language processing techniques and machine learning based data mining techniques for mining useful biological information such as protein-protein interaction from the massive literature. The system recognizes biological terms such as gene, protein, and enzymes and extracts their interactions described in the document through natural language processing. The extracted interactions are further analyzed with a set of features of each entity that were collected from the related public databases to infer more interactions from the original interactions. An inferred interaction from the interaction analysis and native interaction are provided to the user with the link of literature sources. The performance of entity and interaction extraction was tested with selected MEDLINE abstracts. The evaluation of inference proceeded using the protein interaction data of S. cerevisiae (bakers yeast) from MIPS and SGD.
Data Mining*
;
Mining
;
Natural Language Processing
;
Machine Learning
8.Standard-based Integration of Heterogeneous Large-scale DNA Microarray Data for Improving Reusability.
Yong JUNG ; Hwa Jeong SEO ; Yu Rang PARK ; Jihun KIM ; Sang Jay BIEN ; Ju Han KIM
Genomics & Informatics 2011;9(1):19-27
Gene Expression Omnibus (GEO) has kept the largest amount of gene-expression microarray data that have grown exponentially. Microarray data in GEO have been generated in many different formats and often lack standardized annotation and documentation. It is hard to know if preprocessing has been applied to a dataset or not and in what way. Standard-based integration of heterogeneous data formats and metadata is necessary for comprehensive data query, analysis and mining. We attempted to integrate the heterogeneous microarray data in GEO based on Minimum Information About a Microarray Experiment (MIAME) standard. We unified the data fields of GEO Data table and mapped the attributes of GEO metadata into MIAME elements. We also discriminated non-preprocessed raw datasets from others and processed ones by using a two-step classification method. Most of the procedures were developed as semi-automated algorithms with some degree of text mining techniques. We localized 2,967 Platforms, 4,867 Series and 103,590 Samples with covering 279 organisms, integrated them into a standard-based relational schema and developed a comprehensive query interface to extract. Our tool, GEOQuest is available at http://www.snubi.org/software/GEOQuest/
Data Mining
;
DNA
;
Gene Expression
;
Mining
;
Oligonucleotide Array Sequence Analysis
9.Early Aberration Reporting System Modelling of Korean Emergency Syndromic Surveillance System for Bioterrism.
Jae Bong CHUNG ; Moo Eob AHN ; Hee Cheol AHN ; Ki Cheol YOU ; Hyun KIM ; Jun Whi CHO ; Young A CHOI ; Eun Kyeong JEONG
Journal of the Korean Society of Emergency Medicine 2003;14(5):638-645
PURPOSE: This study were designed to supply the opportunity to make a base of emergency syndromic surveillance warning system to detect the bioterrors through the construction of predictive models which were made by reported patients in 'Emergency Syndromic Surveillance System' who were diagnosed as waterborne contagious diseases. METHODS: On this study, we used the neural network analysis methods among the data mining to analyze the reliable variables which was extracted from the reported data bases in the Emergency Syndrome Surveillance System. RESULTS : In this study, we were using the patients data pools from 13th May 2002 to 13th May 2003 in Emergency Syndrome Surveillance System. So we could get the reliable variables - clinical symptoms, severity of patient, humidity and temperature - to predict the waterborne infections. This study shows the successful predictation rate of 96% in error rate of 0.4 with sensible variables through Chisquare analysis and the construction of one hidden layer which is near linearity. CONCLUSION: Early emergency syndromic surveillance warning models made by the neural network in Emergency Syndrome Surveillance System could make the early detection of waterborne infections, could also stop the transmission of waterborne infections in early stage, and furthermore could be used as the preventive and detective methods of bioterror attacks.
Bioterrorism
;
Data Mining
;
Emergencies*
;
Humans
;
Humidity
10.The Strategic Planning of Hospital Management using Information Technology.
Journal of Korean Society of Medical Informatics 1999;5(3):181-192
In the health-care market, the shift from a fee-for-service to a DRG environment has dramatically altered the landscape. To survive in this situation, hospital have to change. Information technology is one of change means. In this study, the means on information technology are presented Data warehouse, Data Mart, OLAP, Forecasting Tool, Statistic Package, Data mining.
Data Mining
;
Diagnosis-Related Groups
;
Forecasting