1.Cross-modal hash retrieval of medical images based on Transformer semantic alignment.
Qianlin WU ; Lun TANG ; Qinghai LIU ; Liming XU ; Qianbin CHEN
Journal of Biomedical Engineering 2025;42(1):156-163
Medical cross-modal retrieval aims to achieve semantic similarity search between different modalities of medical cases, such as quickly locating relevant ultrasound images through ultrasound reports, or using ultrasound images to retrieve matching reports. However, existing medical cross-modal hash retrieval methods face significant challenges, including semantic and visual differences between modalities and the scalability issues of hash algorithms in handling large-scale data. To address these challenges, this paper proposes a Medical image Semantic Alignment Cross-modal Hashing based on Transformer (MSACH). The algorithm employed a segmented training strategy, combining modality feature extraction and hash function learning, effectively extracting low-dimensional features containing important semantic information. A Transformer encoder was used for cross-modal semantic learning. By introducing manifold similarity constraints, balance constraints, and a linear classification network constraint, the algorithm enhanced the discriminability of the hash codes. Experimental results demonstrated that the MSACH algorithm improved the mean average precision (MAP) by 11.8% and 12.8% on two datasets compared to traditional methods. The algorithm exhibits outstanding performance in enhancing retrieval accuracy and handling large-scale medical data, showing promising potential for practical applications.
Algorithms
;
Semantics
;
Humans
;
Ultrasonography
;
Information Storage and Retrieval/methods*
;
Image Processing, Computer-Assisted/methods*
2.Cross modal medical image online hash retrieval based on online semantic similarity.
Qinghai LIU ; Lun TANG ; Qianlin WU ; Liming XU ; Qianbin CHEN
Journal of Biomedical Engineering 2025;42(2):343-350
Online hashing methods are receiving increasing attention in cross modal medical image retrieval research. However, existing online methods often lack the learning ability to maintain semantic correlation between new and existing data. To this end, we proposed online semantic similarity cross-modal hashing (OSCMH) learning framework to incrementally learn compact binary hash codes of medical stream data. Within it, a sparse representation of existing data based on online anchor datasets was designed to avoid semantic forgetting of the data and adaptively update hash codes, which effectively maintained semantic correlation between existing and arriving data and reduced information loss as well as improved training efficiency. Besides, an online discrete optimization method was proposed to solve the binary optimization problem of hash code by incrementally updating hash function and optimizing hash code on medical stream data. Compared with existing online or offline hashing methods, the proposed algorithm achieved average retrieval accuracy improvements of 12.5% and 14.3% on two datasets, respectively, effectively enhancing the retrieval efficiency in the field of medical images.
Semantics
;
Humans
;
Algorithms
;
Information Storage and Retrieval/methods*
;
Diagnostic Imaging
;
Image Processing, Computer-Assisted/methods*
3.Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data
Arnaud FERRÉ ; Mouhamadou BA ; Robert BOSSY
Genomics & Informatics 2019;17(2):e20-
Entity normalization, or entity linking in the general domain, is an information extraction task that aims to annotate/bind multiple words/expressions in raw text with semantic references, such as concepts of an ontology. An ontology consists minimally of a formally organized vocabulary or hierarchy of terms, which captures knowledge of a domain. Presently, machine-learning methods, often coupled with distributional representations, achieve good performance. However, these require large training datasets, which are not always available, especially for tasks in specialized domains. CONTES (CONcept-TErm System) is a supervised method that addresses entity normalization with ontology concepts using small training datasets. CONTES has some limitations, such as it does not scale well with very large ontologies, it tends to overgeneralize predictions, and it lacks valid representations for the out-of-vocabulary words. Here, we propose to assess different methods to reduce the dimensionality in the representation of the ontology. We also propose to calibrate parameters in order to make the predictions more accurate, and to address the problem of out-of-vocabulary words, with a specific method.
Dataset
;
Information Storage and Retrieval
;
Methods
;
Semantics
;
Vocabulary
4.Rotavirus Vaccine Coverage and Related Factors
Sok Goo LEE ; So Youn JEON ; Kwang Suk PARK
Journal of the Korean Society of Maternal and Child Health 2019;23(3):175-184
PURPOSE: The vaccination level of rotavirus vaccine not supported by the government is not known. As vaccines not included in the national immunization schedule are not registered in the computerized national immunization registry system, their vaccination rate cannot be calculated according to the same method used in government-supported vaccines. Therefore, this study aimed to measure the status of the vaccination rate of rotavirus not included in the national immunization schedule. METHODS: The target population is the 0-year-old cohort. The survey population was composed of registered children born in 2017 enrolled in the Immunization Registry Information System. The survey was conducted through a computerized telephone survey method. The survey variables were as follows: vaccination order and date, provider, and data source. Factors related to complete vaccination were the child's sex, residence, birth order, and parents' age, educational level, and job status. RESULTS: Children's vaccination rates for the rotavirus vaccine by 2017 were 88.0%, 86.9%, and 96.6% for the first, second, and third doses, respectively. The rate of complete vaccination was 85.6%. The factors related to rotavirus complete vaccination were the child's sex and birth order, area of residence, parents' age and job status, and father's education level. CONCLUSION: In the future, it is necessary to conduct regular investigations on the rate of rotavirus vaccination as a tool for the development of the rotavirus infectious diseases control policy or as an evaluation tool for vaccine programs.
Birth Order
;
Child
;
Cohort Studies
;
Communicable Diseases
;
Education
;
Health Services Needs and Demand
;
Humans
;
Immunization
;
Immunization Schedule
;
Information Storage and Retrieval
;
Information Systems
;
Methods
;
Rotavirus
;
Surveys and Questionnaires
;
Telephone
;
Vaccination
;
Vaccines
5.Digital Epidemiology: Use of Digital Data Collected for Non-epidemiological Purposes in Epidemiological Studies.
Hyeoun Ae PARK ; Hyesil JUNG ; Jeongah ON ; Seul Ki PARK ; Hannah KANG
Healthcare Informatics Research 2018;24(4):253-262
OBJECTIVES: We reviewed digital epidemiological studies to characterize how researchers are using digital data by topic domain, study purpose, data source, and analytic method. METHODS: We reviewed research articles published within the last decade that used digital data to answer epidemiological research questions. Data were abstracted from these articles using a data collection tool that we developed. Finally, we summarized the characteristics of the digital epidemiological studies. RESULTS: We identified six main topic domains: infectious diseases (58.7%), non-communicable diseases (29.4%), mental health and substance use (8.3%), general population behavior (4.6%), environmental, dietary, and lifestyle (4.6%), and vital status (0.9%). We identified four categories for the study purpose: description (22.9%), exploration (34.9%), explanation (27.5%), and prediction and control (14.7%). We identified eight categories for the data sources: web search query (52.3%), social media posts (31.2%), web portal posts (11.9%), webpage access logs (7.3%), images (7.3%), mobile phone network data (1.8%), global positioning system data (1.8%), and others (2.8%). Of these, 50.5% used correlation analyses, 41.3% regression analyses, 25.6% machine learning, and 19.3% descriptive analyses. CONCLUSIONS: Digital data collected for non-epidemiological purposes are being used to study health phenomena in a variety of topic domains. Digital epidemiology requires access to large datasets and advanced analytics. Ensuring open access is clearly at odds with the desire to have as little personal data as possible in these large datasets to protect privacy. Establishment of data cooperatives with restricted access may be a solution to this dilemma.
Cell Phones
;
Communicable Diseases
;
Data Collection
;
Dataset
;
Epidemiologic Studies*
;
Epidemiological Monitoring
;
Epidemiology*
;
Geographic Information Systems
;
Humans
;
Information Storage and Retrieval
;
Internet
;
Life Style
;
Machine Learning
;
Mental Health
;
Methods
;
Privacy
;
Public Health Surveillance
;
Social Media
6.Traditional Chinese Medicine data management policy in big data environment.
Yang LIANG ; Chang-Song DING ; Xin-di HUANG ; Le DENG
China Journal of Chinese Materia Medica 2018;43(4):840-846
As traditional data management model cannot effectively manage the massive data in traditional Chinese medicine(TCM) due to the uncertainty of data object attributes as well as the diversity and abstraction of data representation, a management strategy for TCM data based on big data technology is proposed. Based on true characteristics of TCM data, this strategy could solve the problems of the uncertainty of data object attributes in TCM information and the non-uniformity of the data representation by using modeless properties of stored objects in big data technology. Hybrid indexing mode was also used to solve the conflicts brought by different storage modes in indexing process, with powerful capabilities in query processing of massive data through efficient parallel MapReduce process. The theoretical analysis provided the management framework and its key technology, while its performance was tested on Hadoop by using several common traditional Chinese medicines and prescriptions from practical TCM data source. Result showed that this strategy can effectively solve the storage problem of TCM information, with good performance in query efficiency, completeness and robustness.
Big Data
;
Information Storage and Retrieval
;
methods
;
Medicine, Chinese Traditional
7.Introduction of Artificial Intelligence in Pathology.
Hanyang Medical Reviews 2017;37(2):77-85
Pathology has a long history of artificial intelligence (AI) as much as any other field of medicine, and has used AI algorithms continuously. However, in Korea, pathology AI is unfamiliar even to the pathologists. In this article, I will summarize the terms and definitions, the basic elements of pathology AI, and the future direction. Digital pathology is a system or environment that digitizes glass slides into binary files, observes them through a monitor or any digital devices, interprets it, analyzes it, and maintains it. Computational pathology is a comprehensive concept of diagnosis support or research system that deals with image, text and omics data. Virtual microscopy is a method or technology that allows pathologists to view and share glass slides images from whole slide scanners. Image analysis is a technique or method that processes various digital images and quantifies features. The basic elements of pathology AI are as follows: environmental factors called digital pathology and technical elements such as AI, machine learning, and deep learning. Digital pathology workflow consists of three elements; acquisition or collection of data, data processing and data storage. The basic process of image analysis consists of preprocessing of image, identification of region of interest, and feature extraction. There is enormous potential for improvement of patient care through digital pathology and/or AI, and a harmonized discussion about activation of Korean digital pathology among government, academia and industry will be mandatory for future medicine and healthcare in Korea.
Artificial Intelligence*
;
Delivery of Health Care
;
Diagnosis
;
Glass
;
Information Storage and Retrieval
;
Korea
;
Learning
;
Machine Learning
;
Methods
;
Microscopy
;
Pathology*
;
Patient Care
8.Methods Using Social Media and Search Queries to Predict Infectious Disease Outbreaks.
Healthcare Informatics Research 2017;23(4):343-348
OBJECTIVES: For earlier detection of infectious disease outbreaks, a digital syndromic surveillance system based on search queries or social media should be utilized. By using real-time data sources, a digital syndromic surveillance system can overcome the limitation of time-delay in traditional surveillance systems. Here, we introduce an approach to develop such a digital surveillance system. METHODS: We first explain how the statistics data of infectious diseases, such as influenza and Middle East Respiratory Syndrome (MERS) in Korea, can be collected for reference data. Then we also explain how search engine queries can be retrieved from Google Trends. Finally, we describe the implementation of the prediction model using lagged correlation, which can be calculated by the statistical packages, i.e., SPSS (Statistical Package for the Social Sciences). RESULTS: Lag correlation analyses demonstrated that search engine data/Twitter have a significant temporal relationship with influenza and MERS data. Therefore, the proposed digital surveillance system can be used to predict infectious disease outbreaks earlier. CONCLUSIONS: This prediction method could be the core engine for implementing a (near-) real-time digital surveillance system. A digital surveillance system that uses Internet resources has enormous potential to monitor disease outbreaks in the early phase.
Communicable Diseases*
;
Coronavirus Infections
;
Disease Outbreaks*
;
Influenza, Human
;
Information Storage and Retrieval
;
Internet
;
Korea
;
Methods*
;
Search Engine
;
Social Media*
9.Using the capture-recapture method to estimate the human immunodeficiency virus-positive population.
Jalal POOROLAJAL ; Younes MOHAMMADI ; Farzad FARZINARA
Epidemiology and Health 2017;39(1):e2017042-
OBJECTIVES: The capture-recapture method was applied to estimate the number of human immunodeficiency virus (HIV)-positive individuals not registered with any data sources. METHODS: This cross-sectional study was conducted in Lorestan Province, in the west of Iran, in 2016. Three incomplete sources of HIV-positive individuals, with partially overlapping data, were used, including: (a) transfusion center, (b) volunteer counseling and testing centers (VCTCs), and (c) prison. The 3-source capture-recapture method, using a log-linear model, was applied for data analysis. The Akaike information criterion and the Bayesian information criterion were used for model selection. RESULTS: Of the 2,456 HIV-positive patients registered in these 3 data sources, 1,175 (47.8%) were identified in transfusion center, 867 (35.3%) in VCTCs, and 414 (16.8%) in prison. After the exclusion of duplicate entries, 2,281 HIV-positive patients remained. Based on the capture-recapture method, 14,868 (95% confidence interval, 9,923 to 23,427) HIV-positive individuals were not identified in any of the registries. Therefore, the real number of HIV-positive individuals was estimated to be 17,149, and the overall completeness of the 3 registries was estimated to be around 13.3%. CONCLUSIONS: Based on capture-recapture estimates, a huge number of HIV-positive individuals are not registered with any of the provincial data sources. This is an urgent message for policymakers who plan and provide health care services for HIV-positive patients. Although the capture-recapture method is a useful statistical approach for estimating unknown populations, due to the assumptions and limitations of the method, the population size may be overestimated as it seems possible in our results.
Counseling
;
Cross-Sectional Studies
;
Delivery of Health Care
;
HIV
;
HIV Seropositivity
;
Humans*
;
Information Storage and Retrieval
;
Iran
;
Linear Models
;
Methods*
;
Population Density
;
Prisons
;
Registries
;
Statistics as Topic
;
Volunteers
10.Using the capture-recapture method to estimate the human immunodeficiency virus-positive population
Jalal POOROLAJAL ; Younes MOHAMMADI ; Farzad FARZINARA
Epidemiology and Health 2017;39(1):2017042-
OBJECTIVES: The capture-recapture method was applied to estimate the number of human immunodeficiency virus (HIV)-positive individuals not registered with any data sources.METHODS: This cross-sectional study was conducted in Lorestan Province, in the west of Iran, in 2016. Three incomplete sources of HIV-positive individuals, with partially overlapping data, were used, including: (a) transfusion center, (b) volunteer counseling and testing centers (VCTCs), and (c) prison. The 3-source capture-recapture method, using a log-linear model, was applied for data analysis. The Akaike information criterion and the Bayesian information criterion were used for model selection.RESULTS: Of the 2,456 HIV-positive patients registered in these 3 data sources, 1,175 (47.8%) were identified in transfusion center, 867 (35.3%) in VCTCs, and 414 (16.8%) in prison. After the exclusion of duplicate entries, 2,281 HIV-positive patients remained. Based on the capture-recapture method, 14,868 (95% confidence interval, 9,923 to 23,427) HIV-positive individuals were not identified in any of the registries. Therefore, the real number of HIV-positive individuals was estimated to be 17,149, and the overall completeness of the 3 registries was estimated to be around 13.3%.CONCLUSIONS: Based on capture-recapture estimates, a huge number of HIV-positive individuals are not registered with any of the provincial data sources. This is an urgent message for policymakers who plan and provide health care services for HIV-positive patients. Although the capture-recapture method is a useful statistical approach for estimating unknown populations, due to the assumptions and limitations of the method, the population size may be overestimated as it seems possible in our results.
Counseling
;
Cross-Sectional Studies
;
Delivery of Health Care
;
HIV
;
HIV Seropositivity
;
Humans
;
Information Storage and Retrieval
;
Iran
;
Linear Models
;
Methods
;
Population Density
;
Prisons
;
Registries
;
Statistics as Topic
;
Volunteers

Result Analysis
Print
Save
E-mail