1.Evaluation of Co-occurring Terms in Clinical Documents Using Latent Semantic Indexing.
Choonghyun HAN ; Sooyoung YOO ; Jinwook CHOI
Healthcare Informatics Research 2011;17(1):24-28
OBJECTIVES: Measurement of similarities between documents is typically influenced by the sparseness of the term-document matrix employed. Latent semantic indexing (LSI) may improve the results of this type of analysis. METHODS: In this study, LSI was utilized in an attempt to reduce the term vector space of clinical documents and newspaper editorials. RESULTS: After applying LSI, document similarities were revealed more clearly in clinical documents than editorials. Clinical documents which can be characterized with co-occurring medical terms, various expressions for the same concepts, abbreviations, and typographical errors showed increased improvement with regards to a correlation between co-occurring terms and document similarities. CONCLUSIONS: Our results showed that LSI can be used effectively to measure similarities in clinical documents. In addition, correlation between the co-occurrence of terms and similarities realized in this study is an important positive feature associated with LSI.
Abstracting and Indexing as Topic
;
Cluster Analysis
;
Information Storage and Retrieval
;
Periodicals
;
Semantics
2.Toward the Automatic Generation of the Entry Level CDA Documents.
Sungwon JUNG ; Seunghee KIM ; Sooyoung YOO ; Jinwook CHOI
Journal of Korean Society of Medical Informatics 2009;15(1):141-151
OBJECTIVE: CDA (Clinical Document Architecture) is a markup standard for clinical document exchange. In order to increase the semantic interoperability of documents exchange, the clinical statements in the narrative blocks should be encoded with code values. Natural language processing (NLP) is required in order to transform the narrative blocks into the coded elements in the level 3 CDA documents. In this paper, we evaluate the accuracy of text mapping methods which are based on NLP. METHODS: We analyzed about one thousand discharge summaries to know their characteristics and focused the syntactic patterns of the diagnostic sections in the discharge summaries. According to the patterns, different rules were applied for matching code values of Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT). RESULTS: The accuracy of matching was evaluated using five-hundred discharge summaries. The precision was as follows: 86.5% for diagnosis, 61.8% for chief complaint, 62.7%, for problem list, and 64.8% for discharge medication. CONCLUSION: The text processing method based on the pattern analysis of a clinical statement can be effectively used for generating CDA entries.
Diagnosis
;
Natural Language Processing
;
Semantics
;
Systematized Nomenclature of Medicine
3.Recognizing Temporal Information in Korean Clinical Narratives through Text Normalization.
Healthcare Informatics Research 2011;17(3):150-155
OBJECTIVES: Acquiring temporal information is important because knowledge in clinical narratives is time-sensitive. In this paper, we describe an approach that can be used to extract the temporal information found in Korean clinical narrative texts. METHODS: We developed a two-stage system, which employs an exhaustive text analysis phase and a temporal expression recognition phase. Since our target document may include tokens that are made up of both Korean and English text joined together, the minimal semantic units are analyzed and then separated from the concatenated phrases and linguistic derivations within a token using a corpus-based approach to decompose complex tokens. A finite state machine is then used on the minimal semantic units in order to find phrases that possess time-related information. RESULTS: In the experiment, the temporal expressions within Korean clinical narratives were extracted using our system. The system performance was evaluated through the use of 100 discharge summaries from Seoul National University Hospital containing a total of 805 temporal expressions. Our system scored a phrase-level precision and recall of 0.895 and 0.919, respectively. CONCLUSIONS: Finding information in Korean clinical narrative is challenging task, since the text is written in both Korean and English and frequently omits syntactic elements and word spacing, which makes it extremely noisy. This study presents an effective method that can be used to aquire the temporal information found in Korean clinical documents.
Automatic Data Processing
;
Linguistics
;
Medical Informatics
;
Medical Records
;
Multilingualism
;
Pattern Recognition, Automated
;
Semantics
4.Future Directions for Next-Generation Hospital Information System.
Healthcare Informatics Research 2015;21(1):1-2
No abstract available.
Hospital Information Systems*
5.Improving the Performance of Text Categorization Models used for the Selection of High Quality Articles.
Healthcare Informatics Research 2012;18(1):18-28
OBJECTIVES: Machine learning systems can considerably reduce the time and effort needed by experts to perform new systematic reviews (SRs). This study investigates categorization models, which are trained on a combination of included and commonly excluded articles, which can improve performance by identifying high quality articles for new procedures or drug SRs. METHODS: Test collections were built using the annotated reference files from 19 procedure and 15 drug systematic reviews. The classification models, using a support vector machine, were trained by the combined even data of other topics, excepting the desired topic. This approach was compared to the combination of included and commonly excluded articles with the combination of included and excluded articles. Accuracy was used for the measure of comparison. RESULTS: On average, the performance was improved by about 15% in the procedure topics and 11% in the drug topics when the classification models trained on the combination of articles included and commonly excluded, were used. The system using the combination of included and commonly excluded articles performed better than the combination of included and excluded articles in all of the procedure topics. CONCLUSIONS: Automatically rigorous article classification using machine learning can reduce the workload of experts when they perform systematic reviews when the topic-specific data are scarce. In particular, when the combination of included and commonly excluded articles is used, this system will be more effective.
Evidence-Based Medicine
;
Machine Learning
;
Review Literature as Topic
;
Support Vector Machine
6.A Web-Based Pulse Wave Information Management System.
Journal of Korean Society of Medical Informatics 2002;8(3):47-53
This paper describes a web-based pulse wave information management system that applies the web solution to the pulse wave extraction and management of the patient's bio-signals. In the oriental medicine, the abnormal arterial pulse wave signals generated from the specific point of body are thought to be related to certain disease conditions of specific internal organs. Therefore, evaluating the pulse wave signals has long been used a major diagnostic means. Numerous studies have been carried out on the development of the pulse wave measuring instruments that c an simply check the one's pulse waves on the radial artery, however, fewer researches have been performed to analyze pulse waves and manage the information in association with the oriental medical information system. Recently, as the usage of instrumental pulse wave analysis is increasing in the practice of the oriental medicine, needs of the pulse wave information management system that can be interfaced with the oriental medical information system are also inc reasing. The web-based pulse wave information management system provides ea sy acc ess, analysis and management of the pulse waves at anywhere one just connects the pulse wave analyser and web browser with the server system and it can also provide the high availability of the pulse wave data. All pulse wave data were easily managed with XML based communication for interchange of the pulse wave data among the existing oriental medicine information systems.
7.Information Extraction Using Concept Node Analysis of Brain Radiology Reports Summarization.
Journal of Korean Society of Medical Informatics 2005;11(1):57-70
OBJECTIVE: Electronic Medical Record contains the majority of clinical data in unstructured text. The information in the textual document can be stored in conceptual format and used to support clinical care by text summarization technique. In this study, we present Information Extraction(IE) using Concept Node(CN) which is extraction rule in case frame from brain radiology reports in SNUH(Seoul National University Hospital) for summarization. METHOD: Following steps are performed: design conceptual model to define semantic entities as extraction templates of brain radiology report, build CN dictionary based on statistical syntactic pattern and development of parser to extract relevant information based on defined templates. RESULTS: The three evaluation results shows that 19% precision improvement after post processing supplemental specified complex verb construction and 19.24~21.25% accurate semantic effectiveness with extracting additional Korean noun. The average of precision is 85.18%, average of recall is 93.71% and F-measure is 0.89. CONCLUSION: Our approach has advantageous elements for different language at the same sentence. We expect this IE technology can summarize vast amount radiology texts material for clinical decision support system effectively and hope this study helps the evolution of clinical data representation in Korean medical records and its integration into the EMR in the future.
Brain*
;
Electronic Health Records
;
Hope
;
Information Storage and Retrieval*
;
Medical Records
;
Semantics
8.An Evaluation of Multiple Query Representations for the Relevance Judgments used to Build a Biomedical Test Collection.
Healthcare Informatics Research 2012;18(1):65-73
OBJECTIVES: The purpose of this study is to validate a method that uses multiple queries to create a set of relevance judgments used to indicate which documents are pertinent to each query when forming a biomedical test collection. METHODS: The aspect query is the major concept of this research; it can represent every aspect of the original query with the same informational need. Manually generated aspect queries created by 15 recruited participants where run using the BM25 retrieval model in order to create aspect query based relevance sets (QRELS). In order to demonstrate the feasibility of these QRELSs, The results from a 2004 genomics track run supported by the National Institute of Standards and Technology (NIST) were used to compute the mean average precision (MAP) based on Text Retrieval Conference (TREC) QRELSs and aspect-QRELSs. The rank correlation was calculated using both Kendall's and Spearman's rank correlation methods. RESULTS: We experimentally verified the utility of the aspect query method by combining the top ranked documents retrieved by a number of multiple queries which ranked the order of the information. The retrieval system correlated highly with rankings based on human relevance judgments. CONCLUSIONS: Substantial results were shown with high correlations of up to 0.863 (p < 0.01) between the judgment-free gold standard based on the aspect queries and the human-judged gold standard supported by NIST. The results also demonstrate that the aspect query method can contribute in building test collections used for medical literature retrieval.
Genomics
;
Humans
;
Information Storage and Retrieval
;
Judgment
;
Statistics as Topic
;
Track and Field
9.Effective Query Expansion using Condensed UMLS Metathesaurus for Medical Information Retrieval.
Journal of Korean Society of Medical Informatics 2004;10(1):43-53
Medical vocabularies in medical records are used in several synonyms and various expressions even though they are same concepts. Query expansion using a thesaurus enhances recall of medical information retrieval (IR) system for searching patient records or literatures. This study proposed IR system architecture applied the Metathesaurus of Unified Medical Language System (UMLS). To enhance the retrieval effectiveness at the same time to reduce retrieval time, we reconstructed condensed Metathesaurus (CMT), which is constituted of frequently used terms in medical records. We used 40,000 radiology reports of Brain CT/MRI at Seoul National University Hospital. The retrieval model we used is the Boolean methods. The results showed 15~27% effectiveness for searching relevant documents implementing the UMLS MT into IR system for query expansion. But it took 3.5 times longer for retrieval compared with non-MT implemented IR system. When we applied the CMT into IR system, however, the retrieval time reduced by 50% and the retrieval performance decreased only 8.7% compared with all MT implemented IR system. In this paper, we developed the medical document retrieval system applied UMLS MT for query expansion methods that can improve the relevant document retrieval performance, at the same time it can reduce the retrieval time through consisting condensed Metathesaurus for a specific domain.
Brain
;
Computing Methodologies
;
Humans
;
Information Storage and Retrieval*
;
Medical Records
;
Seoul
;
Unified Medical Language System*
;
Vocabulary
;
Vocabulary, Controlled
10.Evaluation of Term Ranking Algorithms for Pseudo-Relevance Feedback in MEDLINE Retrieval.
Healthcare Informatics Research 2011;17(2):120-130
OBJECTIVES: The purpose of this study was to investigate the effects of query expansion algorithms for MEDLINE retrieval within a pseudo-relevance feedback framework. METHODS: A number of query expansion algorithms were tested using various term ranking formulas, focusing on query expansion based on pseudo-relevance feedback. The OHSUMED test collection, which is a subset of the MEDLINE database, was used as a test corpus. Various ranking algorithms were tested in combination with different term re-weighting algorithms. RESULTS: Our comprehensive evaluation showed that the local context analysis ranking algorithm, when used in combination with one of the reweighting algorithms - Rocchio, the probabilistic model, and our variants - significantly outperformed other algorithm combinations by up to 12% (paired t-test; p < 0.05). In a pseudo-relevance feedback framework, effective query expansion would be achieved by the careful consideration of term ranking and re-weighting algorithm pairs, at least in the context of the OHSUMED corpus. CONCLUSIONS: Comparative experiments on term ranking algorithms were performed in the context of a subset of MEDLINE documents. With medical documents, local context analysis, which uses co-occurrence with all query terms, significantly outperformed various term ranking methods based on both frequency and distribution analyses. Furthermore, the results of the experiments demonstrated that the term rank-based re-weighting method contributed to a remarkable improvement in mean average precision.
Information Storage and Retrieval
;
Models, Statistical