Search Results

1.Medical text classification model integrating medical entity label semantics.

Li WEI ; Dechun ZHAO ; Lu QIN ; Yanghuazi LIU ; Yuchen SHEN ; Changrong YE

Journal of Biomedical Engineering 2025;42(2):326-333

Automatic classification of medical questions is of great significance in improving the quality and efficiency of online medical services, and belongs to the task of intent recognition. Joint entity recognition and intent recognition perform better than single task models. Currently, most publicly available medical text intent recognition datasets lack entity annotation, and manual annotation of these entities requires a lot of time and manpower. To solve this problem, this paper proposes a medical text classification model, bidirectional encoder representation based on transformer-recurrent convolutional neural network-entity-label-semantics (BRELS), which integrates medical entity label semantics. This model firstly utilizes an adaptive fusion mechanism to absorb prior knowledge of medical entity labels, achieving local feature enhancement. Then in global feature extraction, a lightweight recurrent convolutional neural network (LRCNN) is used to suppress parameter growth while preserving the original semantics of the text. The ablation and comparison experiments are conducted on three public medical text intent recognition datasets to validate the performance of the model. The results show that F1 score reaches 87.34%, 81.71%, and 77.74% on each dataset, respectively. The results show that the BRELS model can effectively identify and understand medical terminology, thereby effectively identifying users' intentions, which can improve the quality and efficiency of online medical services.
Semantics ; Neural Networks, Computer ; Humans ; Natural Language Processing

2.Performance of a prompt engineering method for extracting individual risk factors of precocious puberty from electronic medical records.

Feixiang ZHOU ; Taowei ZHONG ; Guiyan YANG ; Xianglong DING ; Yan YAN

Journal of Central South University(Medical Sciences) 2025;50(7):1224-1233

OBJECTIVES: Accurate identification of risk factors for precocious puberty is essential for clinical diagnosis and management, yet the performance of natural language processing methods applied to unstructured electronic medical record (EMR) data remains to be fully evaluated. This study aims to assess the performance of a prompt engineering method for extracting individual risk factors of precocious puberty from EMRs. METHODS: Based on the capacity and role-insight-statement-personality-experiment (CRISPE) prompt framework, both simple and optimized prompts were designed to guide the large language model GLM-4-9B in extracting 10 types of risk factors for precocious puberty from 653 EMRs. Accuracy, precision, recall, and F1-score were used as evaluation metrics for the information extraction task. RESULTS: Under simple and optimized prompt conditions, the overall accuracy, precision, recall, and F1-score of the model were 84.18%, 98.09%, 81.99%, and 89.32% versus 97.15%, 98.31%, 98.16%, and 98.23%, respectively. The optimized prompts achieved more stable performance across age (<9 years vs ≥9 years) and visit-time (<2023 vs ≥2023) subgroups compared with simple prompts. The accuracy range for extracting each risk factor was 60.03%-97.24%, while with optimized prompts, the range improved to 92.19%-99.85%. The largest performance improvement occurred for "beverage intake" (60.03% vs 92.19%), and the smallest for "maternal age of menarche" (97.24% vs 99.23%). In comparing distributions among simple prompts, optimized prompts, and ground truth, statistically significant differences were observed for snack intake, beverage intake, soy milk intake, honey intake, supplement use, tonic use, sleep quality, and sleeping with the light on (all P<0.001), while exercise (P=0.966) and maternal menarche age (P=0.952) showed no significant differences. CONCLUSIONS Compared with simple prompts, optimized prompts substantially improved the extraction performance of individual risk factors for precocious puberty from EMRs, underscoring the critical role of prompt engineering in enhancing large language model performance.
Humans ; Puberty, Precocious/epidemiology* ; Risk Factors ; Electronic Health Records ; Female ; Child ; Natural Language Processing

3.An antibacterial peptides recognition method based on BERT and Text-CNN.

Xiaofang XU ; Chunde YANG ; Kunxian SHU ; Xinpu YUAN ; Mocheng LI ; Yunping ZHU ; Tao CHEN

Chinese Journal of Biotechnology 2023;39(4):1815-1824

Antimicrobial peptides (AMPs) are small molecule peptides that are widely found in living organisms with broad-spectrum antibacterial activity and immunomodulatory effect. Due to slower emergence of resistance, excellent clinical potential and wide range of application, AMP is a strong alternative to conventional antibiotics. AMP recognition is a significant direction in the field of AMP research. The high cost, low efficiency and long period shortcomings of the wet experiment methods prevent it from meeting the need for the large-scale AMP recognition. Therefore, computer-aided identification methods are important supplements to AMP recognition approaches, and one of the key issues is how to improve the accuracy. Protein sequences could be approximated as a language composed of amino acids. Consequently, rich features may be extracted using natural language processing (NLP) techniques. In this paper, we combine the pre-trained model BERT and the fine-tuned structure Text-CNN in the field of NLP to model protein languages, develop an open-source available antimicrobial peptide recognition tool and conduct a comparison with other five published tools. The experimental results show that the optimization of the two-phase training approach brings an overall improvement in accuracy, sensitivity, specificity, and Matthew correlation coefficient, offering a novel approach for further research on AMP recognition.
Anti-Bacterial Agents/chemistry* ; Amino Acid Sequence ; Antimicrobial Cationic Peptides/chemistry* ; Antimicrobial Peptides ; Natural Language Processing

4.Survey on natural language processing in medical image analysis.

Zhengliang LIU ; Mengshen HE ; Zuowei JIANG ; Zihao WU ; Haixing DAI ; Lian ZHANG ; Siyi LUO ; Tianle HAN ; Xiang LI ; Xi JIANG ; Dajiang ZHU ; Xiaoyan CAI ; Bao GE ; Wei LIU ; Jun LIU ; Dinggang SHEN ; Tianming LIU

Journal of Central South University(Medical Sciences) 2022;47(8):981-993

5.Automatic labeling and extraction of terms in natural language processing in acupuncture clinical literature.

Hua-Yun LIU ; Chen-Jing HAN ; Jie XIONG ; Hai-Yan LI ; Lei LEI ; Bao-Yan LIU

Chinese Acupuncture & Moxibustion 2022;42(3):327-331

6.Artificial intelligence based Chinese clinical trials eligibility criteria classification.

Hui ZONG ; Zeyu ZHANG ; Jinxuan YANG ; Jianbo LEI ; Zuofeng LI ; Tianyong HAO ; Xiaoyan ZHANG

Journal of Biomedical Engineering 2021;38(1):105-110

Subject recruitment is a key component that affects the progress and results of clinical trials, and generally conducted with eligibility criteria (includes inclusion criteria and exclusion criteria). The semantic category analysis of eligibility criteria can help optimizing clinical trials design and building automated patient recruitment system. This study explored the automatic semantic categories classification of Chinese eligibility criteria based on artificial intelligence by academic shared task. We totally collected 38 341 annotated eligibility criteria sentences and predefined 44 semantic categories. A total of 75 teams participated in competition, with 27 teams having submitted system outputs. Based on the results, we found out that most teams adopted mixed models. The mainstream resolution was applying pre-trained language models capable of providing rich semantic representation, which were combined with neural network models and used to fine-tune the models with reference to classifier tasks, and finally improved classification performance could be obtained by ensemble modeling. The best-performing system achieved a macro
Artificial Intelligence ; China ; Humans ; Language ; Natural Language Processing ; Neural Networks, Computer

7.Health Information Technology Trends in Social Media: Using Twitter Data

Jisan LEE ; Jeongeun KIM ; Yeong Joo HONG ; Meihua PIAO ; Ahjung BYUN ; Healim SONG ; Hyeong Suk LEE

Healthcare Informatics Research 2019;25(2):99-105

OBJECTIVES: This study analyzed the health technology trends and sentiments of users using Twitter data in an attempt to examine the public's opinions and identify their needs. METHODS: Twitter data related to health technology, from January 2010 to October 2016, were collected. An ontology related to health technology was developed. Frequently occurring keywords were analyzed and visualized with the word cloud technique. The keywords were then reclassified and analyzed using the developed ontology and sentiment dictionary. Python and the R program were used for crawling, natural language processing, and sentiment analysis. RESULTS: In the developed ontology, the keywords are divided into ‘health technology‘ and ‘health information‘. Under health technology, there are are six subcategories, namely, health technology, wearable technology, biotechnology, mobile health, medical technology, and telemedicine. Under health information, there are four subcategories, namely, health information, privacy, clinical informatics, and consumer health informatics. The number of tweets about health technology has consistently increased since 2010; the number of posts in 2014 was double that in 2010, which was about 150 thousand posts. Posts about mHealth accounted for the majority, and the dominant words were ‘care‘, ‘new‘, ‘mental‘, and ‘fitness‘. Sentiment analysis by subcategory showed that most of the posts in nearly all subcategories had a positive tone with a positive score. CONCLUSIONS: Interests in mHealth have risen recently, and consequently, posts about mHealth were the most frequent. Examining social media users' responses to new health technology can be a useful method to understand the trends in rapidly evolving fields.
Biomedical Technology ; Biotechnology ; Boidae ; Data Mining ; Informatics ; Medical Informatics ; Methods ; Natural Language Processing ; Privacy ; Public Opinion ; Social Media ; Telemedicine

8.Improving spaCy dependency annotation and PoS tagging web service using independent NER services

Nico COLIC ; Fabio RINALDI

Genomics & Informatics 2019;17(2):e21-

9.Towards cross-platform interoperability for machine-assisted text annotation

Richard ECKART DE CASTILHO ; Nancy IDE ; Jin Dong KIM ; Jan Christoph KLIE ; Keith SUDERMAN

Genomics & Informatics 2019;17(2):e19-

10.OryzaGP: rice gene and protein dataset for named-entity recognition

Pierre LARMANDE ; Huy DO ; Yue WANG

Genomics & Informatics 2019;17(2):e17-

Text mining has become an important research method in biology, with its original purpose to extract biological entities, such as genes, proteins and phenotypic traits, to extend knowledge from scientific papers. However, few thorough studies on text mining and application development, for plant molecular biology data, have been performed, especially for rice, resulting in a lack of datasets available to solve named-entity recognition tasks for this species. Since there are rare benchmarks available for rice, we faced various difficulties in exploiting advanced machine learning methods for accurate analysis of the rice literature. To evaluate several approaches to automatically extract information from gene/protein entities, we built a new dataset for rice as a benchmark. This dataset is composed of a set of titles and abstracts, extracted from scientific papers focusing on the rice species, and is downloaded from PubMed. During the 5th Biomedical Linked Annotation Hackathon, a portion of the dataset was uploaded to PubAnnotation for sharing. Our ultimate goal is to offer a shared task of rice gene/protein name recognition through the BioNLP Open Shared Tasks framework using the dataset, to facilitate an open comparison and evaluation of different approaches to the task.
Benchmarking ; Biology ; Data Mining ; Dataset ; Machine Learning ; Methods ; Molecular Biology ; Natural Language Processing ; Oryza ; Plants