1.Making Sense of the Big Picture: Data Linkage and Integration in the Era of Big Data.
Healthcare Informatics Research 2018;24(4):251-252
No abstract available.
Information Storage and Retrieval*
2.Evaluation of Term Ranking Algorithms for Pseudo-Relevance Feedback in MEDLINE Retrieval.
Healthcare Informatics Research 2011;17(2):120-130
OBJECTIVES: The purpose of this study was to investigate the effects of query expansion algorithms for MEDLINE retrieval within a pseudo-relevance feedback framework. METHODS: A number of query expansion algorithms were tested using various term ranking formulas, focusing on query expansion based on pseudo-relevance feedback. The OHSUMED test collection, which is a subset of the MEDLINE database, was used as a test corpus. Various ranking algorithms were tested in combination with different term re-weighting algorithms. RESULTS: Our comprehensive evaluation showed that the local context analysis ranking algorithm, when used in combination with one of the reweighting algorithms - Rocchio, the probabilistic model, and our variants - significantly outperformed other algorithm combinations by up to 12% (paired t-test; p < 0.05). In a pseudo-relevance feedback framework, effective query expansion would be achieved by the careful consideration of term ranking and re-weighting algorithm pairs, at least in the context of the OHSUMED corpus. CONCLUSIONS: Comparative experiments on term ranking algorithms were performed in the context of a subset of MEDLINE documents. With medical documents, local context analysis, which uses co-occurrence with all query terms, significantly outperformed various term ranking methods based on both frequency and distribution analyses. Furthermore, the results of the experiments demonstrated that the term rank-based re-weighting method contributed to a remarkable improvement in mean average precision.
Information Storage and Retrieval
;
Models, Statistical
3.Regulatory innovation for expansion of indications and pediatric drug development
Translational and Clinical Pharmacology 2018;26(4):155-159
For regulatory approval of a new drug, the most preferred and reliable source of evidence would be randomized controlled trials (RCT). However, a great number of drugs, being developed as well as already marketed and being used, usually lack proper indications for children. It is imperative to develop properly evaluated drugs for children. And expanding the use of already approved drugs for other indications will benefit patients and the society. Nevertheless, to get an approval for expansion of indications, most often with off-label experiences, for drugs that have been approved or for the development of pediatric indications, either during or after completing the main drug development, conducting RCTs may not be the only, if not right, way to take. Extrapolation strategies and modelling & simulation for pediatric drug development are paving the road to the better approval scheme. Making the use of data sources other than RCT such as EHR and claims data in ways that improve the efficiency and validity of the results (e.g., randomized pragmatic trial and randomized registry trial) has been the topic of great interest all around the world. Regulatory authorities should adopt new methodologies for regulatory approval processes to adapt to the changes brought by increasing availability of big and real world data utilizing new tools of technological advancement.
Child
;
Humans
;
Information Storage and Retrieval
4.Therapeutic evaluation on complex interventions of integrative medicine and the potential role of data mining.
Yu QIU ; Hao XU ; Dong-yan ZHAO
Chinese journal of integrative medicine 2010;16(5):466-471
It is a common view that the integration of Chinese medicine (CM) and modern Western medicine is an efficient way to facilitate the development of CM. Integrative medicine is a kind of complex interventions. Scientific therapeutic evaluation plays a crucial role in making integrative medicine universally acknowledged. However, the modern method of clinical study, which is based on the concept of evidence-based medicine, mostly focuses on the population characteristics and single interventional factor. As a result, it is difficult for this method to totally adapt to the clinical features of CM and integrative medicine as complex interventions. One possible way to solve this issue is to improve and integrate with the existing method and to utilize the evaluation model on complex interventions from abroad. As an interdisciplinary technique, data mining involves database technology, artificial intelligence, machine learning, statistics, neural network and some other latest technologies, and has been widely used in the field of CM. Therefore, the application of data mining in the therapeutic evaluation of integrative medicine has broad prospects.
Information Storage and Retrieval
;
Integrative Medicine
5.Design of an Integration System for Bioinformatics Data Sources Using a Global MDR.
Journal of Korean Society of Medical Informatics 2008;14(2):189-199
OBJECTIVES: Nowadays, as the amounts of biological data are rapidly increasing, bioinformatics has become one of the important research issues. The bioinformatics data sources are, however, distributed and heterogeneous, and therefore, often poorly integrated and difficult to use together. As many bioinformatics analyses need to make use of multiple information sources, the problem of integration of bioinformatics data sources has become an important one. The purpose of this paper is to present an integration system for bioinformatics data sources. METHODS: To solve this problem, we present an integration system for bioinformatics data sources using a global MDR, which provides users with efficiency and convenience as if they use one system. We deal with the extraction of data elements for bioinformatics MDRs by using ISO 11179 mandatory attributes. RESULTS: A global bioinformatics MDR schema for given MDRs and the results of query processing are presented. CONCLUSIONS: The proposed system and concepts in this paper may be a good solution for the integration of diverse bioinformatics data sources.
Computational Biology
;
Information Storage and Retrieval
6.Development of Microarray Gene Expression Database for MicroArray Gene Expression Markup Language.
Ji Yeon PARK ; Se Young KIM ; Yu Rang PARK ; Hwa Jeong SEO ; Ju Han KIM
Journal of Korean Society of Medical Informatics 2004;10(3):347-353
OBJECTIVE: Gene expression microarrays become a widely used tool in biomedicine. With growing needs of microarray data sharing, there are efforts for the development of microarray standards. MAGE-OM(Microarray Gene Expression Object Model) is a data exchange model and MAGE-ML is an XML-based data exchange format. Most database, however, do not have a suitable structure for MAGE-ML storage and maximum use of the data. Therefore, we have created relational database implementing MAGE-OM for the storage of MAGE-ML with importing and exporting capabilities. METHODS: A relational schema is derived from MAGE-OM with simple object-relational mapping strategy to reduce complexity of MAGE-OM. Data transfer between database and MAGE-ML document is performed via MAGE-OM using the MAGE Software Toolkit(MAGEstk). RESULTS: Our database accepts microarray data as MAGE-ML files through web-based interface, classifying into two types of submission, array or experiment. MAGE-ML import-export function is flexible to accommodate changing data model by separating model definition and implementation layers. CONCLUSION: Standard-based implementation of gene expression database enhances the collection and the structured storage of large-scale gene expression data from heterogeneous data sources.
Information Storage and Retrieval
;
Gene Expression*
;
Information Dissemination
7.Implementation of a hierarchical storage manager system (HSM) in hospital PACS.
Chinese Journal of Medical Instrumentation 2007;31(3):211-227
In the paper, we discuss a series of problems in the hospital PACS storage system and introduce the implementation of hierarchical storages in this system. The detailed technique requirements of this system are also discussed.
Hospital Information Systems
;
Information Storage and Retrieval
8.Utilizing social media data in post-market safety surveillance.
Yu YANG ; Sheng Feng WANG ; Si Yan ZHAN
Journal of Peking University(Health Sciences) 2021;53(3):623-627
Post-marketing surveillance is the principal means to ensure drug use safety. The spontaneous report is the essential method of post-marketing surveillance for drug safety. Often, most spontaneous reports come from medical staff and sometimes come from patients who use the drug. The posts published by individuals on social media platforms that contain drugs and related adverse reaction content have gradually been seen as a new data source similar to spontaneous reports from drug users in recent years. Those user-generated posts potentially provide researchers and regulators with new opportunities to conduct post-marketing surveillance for drug safety from patients' perspectives mostly rather than medical professionals and can afford the possibility theoretically to discover drug-related safety issues earlier than traditional methods. Social media data as a new data source for safety signal detection and signal reinforcement have the unique advantages, such as population coverage, type of drugs, type of adverse reactions, data timeliness and quantity. Most of the social media data used in post-marketing surveillance research for drug safety are still text data in English, and even multiple languages are used by different people worldwide on several social media platforms. Unfortunately, there is still a controversy in the academic circles whether social media data can be used as reliable data sources for routine post-marketing surveillance for drug safety. A couple of obstacles of data, methods and ethics must be overcome before leveraging social media data for post-marketing surveillance. The number of Chinese social media users is large, and the social media data in the Chinese language is rapidly snowballing, which can be employed as the potential data source for post-marketing surveillance for drug safety. However, due to the Chinese language's specific characteristics, the text's diversity is different from the English text, and there is not enough accepted corpus in medical scenarios. Besides, the lack of domestic laws and regulations on privacy and security protection of social media data poses more challenges for applying Chinese social media data for post-market surveillance. The significance of social media data to post-marketing surveillance for drug safety is undoubtedly significant. It will be an essential development direction for future research to overcome the challenges of using social media data by developing new technologies and establishing new mechanisms.
Humans
;
Information Storage and Retrieval
;
Marketing
;
Social Media
9.Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data
Arnaud FERRÉ ; Mouhamadou BA ; Robert BOSSY
Genomics & Informatics 2019;17(2):e20-
Entity normalization, or entity linking in the general domain, is an information extraction task that aims to annotate/bind multiple words/expressions in raw text with semantic references, such as concepts of an ontology. An ontology consists minimally of a formally organized vocabulary or hierarchy of terms, which captures knowledge of a domain. Presently, machine-learning methods, often coupled with distributional representations, achieve good performance. However, these require large training datasets, which are not always available, especially for tasks in specialized domains. CONTES (CONcept-TErm System) is a supervised method that addresses entity normalization with ontology concepts using small training datasets. CONTES has some limitations, such as it does not scale well with very large ontologies, it tends to overgeneralize predictions, and it lacks valid representations for the out-of-vocabulary words. Here, we propose to assess different methods to reduce the dimensionality in the representation of the ontology. We also propose to calibrate parameters in order to make the predictions more accurate, and to address the problem of out-of-vocabulary words, with a specific method.
Dataset
;
Information Storage and Retrieval
;
Methods
;
Semantics
;
Vocabulary
10.The Expressive Power of SNOMED-CT Compared with the Discharge Summaries.
Seung hee KIM ; Seung Bin HAN ; Jinwook CHOI
Journal of Korean Society of Medical Informatics 2005;11(3):265-272
OBJECTIVE: The standard vocabularies need to cover a diverse and enriched field of medical content, thereby facilitating semantic information retrieval, clinical decision support and efficient care delivery. SNOMED-CT(Systematized Nomenclature of Human and Veterinary Medicine-Clinical Term) is a comprehensive and precise clinical reference terminology that provides unsurpassed clinical content and expressivity for clinical documentation and reporting. To investigate whether the SNOMED-CT can serve this function in Seoul National University Hospital(SNUH) environment, we evaluated the coverage of SNOMED-CT as compared with clinical terms in the discharge summary at SNUH. METHODS: We tested for discordance of clinical terms between SNUH discharge summary and those from SNOMED-CT. We extracted 9,554 concepts from 1,000 discharge summaries. From these concepts, we obtained 3,545 unique concepts which are normalized to map with SNOMED-CT. These normalized terms are mapped to concepts of SNOMED-CT with semi-automatic method. RESULTS: We found a degree of concordance between SNOMED-CT and the clinical terms used in the discharge summary. Approximately, 89% of medical terms in the discharge summary are matched and 11% of the concepts are not mapped to those of SNOMED-CT. CONCLUSION: Through this study, we confirmed that SNOMED-CT is appropriate reference terminology in SNUH environment.
Humans
;
Information Storage and Retrieval
;
Semantics
;
Seoul
;
Vocabulary