1.ManBIF: a Program for Mining and Managing Biobank Impact Factor Data.
Ki Jin YU ; Jungmin NAM ; Yun HER ; Minseock CHU ; Hyungseok SEO ; Junwoo KIM ; Jaepil JEON ; Hyekyung PARK ; Kiejung PARK
Genomics & Informatics 2011;9(1):37-38
Biobank Impact Factor (BIF), which is a very effective criterion to evaluate the activity of biobanks, can be estimated by the citation information of biobanks from scientific papers. We have developed a program, ManBIF, to investigate the citation information from PDF files in the literature. The program manages a dictionary for expressions to represent biobanks and their resources, mines the citation information by converting PDF files to text files and searching with a dictionary, and produces a statistical report file. It can be used as an important tool by biobanks.
Mining
4.Introduction to BLAH5 special issue: recent progress on interoperability of biomedical text mining
Jin Dong KIM ; Kevin Bretonnel COHEN ; Nigel COLLIER ; Zhiyong LU ; Fabio RINALDI
Genomics & Informatics 2019;17(2):e12-
No abstract available.
Data Mining
5.Use of Graph Database for the Integration of Heterogeneous Biological Data.
Byoung Ha YOON ; Seon Kyu KIM ; Seon Young KIM
Genomics & Informatics 2017;15(1):19-27
Understanding complex relationships among heterogeneous biological data is one of the fundamental goals in biology. In most cases, diverse biological data are stored in relational databases, such as MySQL and Oracle, which store data in multiple tables and then infer relationships by multiple-join statements. Recently, a new type of database, called the graph-based database, was developed to natively represent various kinds of complex relationships, and it is widely used among computer science communities and IT industries. Here, we demonstrate the feasibility of using a graph-based database for complex biological relationships by comparing the performance between MySQL and Neo4j, one of the most widely used graph databases. We collected various biological data (protein-protein interaction, drug-target, gene-disease, etc.) from several existing sources, removed duplicate and redundant data, and finally constructed a graph database containing 114,550 nodes and 82,674,321 relationships. When we tested the query execution performance of MySQL versus Neo4j, we found that Neo4j outperformed MySQL in all cases. While Neo4j exhibited a very fast response for various queries, MySQL exhibited latent or unfinished responses for complex queries with multiple-join statements. These results show that using graph-based databases, such as Neo4j, is an efficient way to store complex biological relationships. Moreover, querying a graph database in diverse ways has the potential to reveal novel relationships among heterogeneous biological data.
Biology
;
Data Mining
6.Working environment and health of workers in Na Duong coal mine, Lang Son province
Journal of Preventive Medicine 2005;15(6):65-69
The study was conducted on workers in Na Duong coal mine, Lang Son province to investigate the working environment, health status and diseases. The results showed that working environment was contaminated by toxics that were above allowed limits, such as high silic dust level. Some common diseases were ear-nose-throat diseases, accounting for 77.2%, eye diseases 39.9%, digestion 17.8%, heart diseases 15.1%, and respiratory diseases 14.1%. Among respiratory diseases, silic dust-related one was significant. The rate in the mine neighborhood area was 10% and at the working site was 11%. Coal mine workers’ health was a little below the average compared with other domestic manufacturing sectors, nobody had health status at level I.
Environment
;
Health
;
Coal Mining
7.Variable Threshold based Feature Selection using Spatial Distribution of Data.
Chang Sik SON ; A Mi SHIN ; Young Dong LEE ; Hee Joon PARK ; Hyoung Seob PARK ; Yoon Nyun KIM
Journal of Korean Society of Medical Informatics 2009;15(4):475-481
OBJECTIVE: In processing high dimensional clinical data, choosing the optimal subset of features is important, not only for reduce the computational complexity but also to improve the value of the model constructed from the given data. This study proposes an efficient feature selection method with a variable threshold. METHODS: In the proposed method, the spatial distribution of labeled data, which has non-redundant attribute values in the overlapping regions, was used to evaluate the degree of intra-class separation, and the weighted average of the redundant attribute values were used to select the cut-off value of each feature. RESULTS: The effectiveness of the proposed method was demonstrated by comparing the experimental results for the dyspnea patients' dataset with 11 features selected from 55 features by clinical experts with those obtained using seven other classification methods. CONCLUSION: The proposed method can work well for clinical data mining and pattern classification applications.
Data Mining
;
Dyspnea
8.Occupational safety, best practices, and legislative review on small-scale mining in the Philippines
Jinky Leilanie Lu ; Sophia Francesca Lu
Acta Medica Philippina 2022;56(1):12-23
Introduction:
Small-scale mining (SSM) has been in the Philippines since the early 1900s and significant contributor to the local economy. SSM has contributed 14% of the country's total Gross Domestic Product and has a revenue share of about 19 billion pesos (380 million USD).
Objectives:
This study aims to document mining occupational safety and health in SSM in the Philippines and identify best practices among miners and communities to reduce toxic chemical use in mining. It also aims to evolve laws and legislative measures on mining in the country as the basis for more aggressive policies and programs for SSM in the Philippines.
Methods:
The data were based on gray literature, peer-reviewed journals, databases, government statistics, and secondary literature. Data were analyzed through critical appraisal on the impacts of mining in terms of occupational safety, mining issues, hazards, and disasters, environmental and health impact, as well as documentation of best practices in mining to reduce the use of toxic chemicals, and the current laws and legislations on mining in the Philippines.
Results:
SSM or artisanal mining is categorized as part of the informal sector of the market economy. In the Philippines, the leading types of accidents in the mines are being hit by falling objects, suffocation from chemical fumes, and crushing injuries, exposure to intense heat, poor ventilation, vibration, dust, fumes, repetitive stress injury, intense noise, manual handling (e.g., lifting) of heavy machinery, and biological and chemical hazard. Occupational illnesses include skin diseases, emphysema, chronic obstructive lung disease, and hearing loss. Due to these risks, the Philippines has adopted mercury-free mining, cyanide reduction, and green and climate-smart mining. The use of borax in recovering gold from ore instead of mercury originated in the Philippines, which is now widely known as the mercury-free gravity-borax method adopted in Africa and Asia. The Philippines also has a plethora of laws covering mining as a whole. Developmental directives include enacting specific SSM laws and regulations, including a separate set of safety rules, and decentralizing the issue and control of SSM permits and licenses through local government units. Some noted legislative measures, Presidential Decrees, and Administrative Orders have been crafted to cover the safety net, equity, safety, and health for small-scale miners, among the most vulnerable working populations.
Discussion:
Hazards and risks have been documented in SSM in the Philippines. However, the policies, legislation, and protective measures on SSM warrant more comprehensive coverage, implementation, and provision of social safety nets.
Conclusion
The study concludes that mining in the Philippines continues to be a problem as it produces adverse effects on workers' health, the community, and the environment. It is crucial to ensure the health and safety of mining workers, and all players and stakeholders must fulfill their respective roles. Governments and communities need to perform their regulatory and monitoring functions dutifully to build up their capacities to benefit mining communities that contribute much to the local economy.
Occupational Injuries
;
Mining
9.Text Mining in Biomedical Domain with Emphasis on Document Clustering.
Healthcare Informatics Research 2017;23(3):141-146
OBJECTIVES: With the exponential increase in the number of articles published every year in the biomedical domain, there is a need to build automated systems to extract unknown information from the articles published. Text mining techniques enable the extraction of unknown knowledge from unstructured documents. METHODS: This paper reviews text mining processes in detail and the software tools available to carry out text mining. It also reviews the roles and applications of text mining in the biomedical domain. RESULTS: Text mining processes, such as search and retrieval of documents, pre-processing of documents, natural language processing, methods for text clustering, and methods for text classification are described in detail. CONCLUSIONS: Text mining techniques can facilitate the mining of vast amounts of knowledge on a given topic from published biomedical research articles and draw meaningful conclusions that are not possible otherwise.
Classification
;
Cluster Analysis*
;
Data Mining*
;
Mining
;
Natural Language Processing
10.Standard-based Integration of Heterogeneous Large-scale DNA Microarray Data for Improving Reusability.
Yong JUNG ; Hwa Jeong SEO ; Yu Rang PARK ; Jihun KIM ; Sang Jay BIEN ; Ju Han KIM
Genomics & Informatics 2011;9(1):19-27
Gene Expression Omnibus (GEO) has kept the largest amount of gene-expression microarray data that have grown exponentially. Microarray data in GEO have been generated in many different formats and often lack standardized annotation and documentation. It is hard to know if preprocessing has been applied to a dataset or not and in what way. Standard-based integration of heterogeneous data formats and metadata is necessary for comprehensive data query, analysis and mining. We attempted to integrate the heterogeneous microarray data in GEO based on Minimum Information About a Microarray Experiment (MIAME) standard. We unified the data fields of GEO Data table and mapped the attributes of GEO metadata into MIAME elements. We also discriminated non-preprocessed raw datasets from others and processed ones by using a two-step classification method. Most of the procedures were developed as semi-automated algorithms with some degree of text mining techniques. We localized 2,967 Platforms, 4,867 Series and 103,590 Samples with covering 279 organisms, integrated them into a standard-based relational schema and developed a comprehensive query interface to extract. Our tool, GEOQuest is available at http://www.snubi.org/software/GEOQuest/
Data Mining
;
DNA
;
Gene Expression
;
Mining
;
Oligonucleotide Array Sequence Analysis