1.KG-CNNDTI: a knowledge graph-enhanced prediction model for drug-target interactions and application in virtual screening of natural products against Alzheimer's disease.
Chengyuan YUE ; Baiyu CHEN ; Long CHEN ; Le XIONG ; Changda GONG ; Ze WANG ; Guixia LIU ; Weihua LI ; Rui WANG ; Yun TANG
Chinese Journal of Natural Medicines (English Ed.) 2025;23(11):1283-1292
Accurate prediction of drug-target interactions (DTIs) plays a pivotal role in drug discovery, facilitating optimization of lead compounds, drug repurposing and elucidation of drug side effects. However, traditional DTI prediction methods are often limited by incomplete biological data and insufficient representation of protein features. In this study, we proposed KG-CNNDTI, a novel knowledge graph-enhanced framework for DTI prediction, which integrates heterogeneous biological information to improve model generalizability and predictive performance. The proposed model utilized protein embeddings derived from a biomedical knowledge graph via the Node2Vec algorithm, which were further enriched with contextualized sequence representations obtained from ProteinBERT. For compound representation, multiple molecular fingerprint schemes alongside the Uni-Mol pre-trained model were evaluated. The fused representations served as inputs to both classical machine learning models and a convolutional neural network-based predictor. Experimental evaluations across benchmark datasets demonstrated that KG-CNNDTI achieved superior performance compared to state-of-the-art methods, particularly in terms of Precision, Recall, F1-Score and area under the precision-recall curve (AUPR). Ablation analysis highlighted the substantial contribution of knowledge graph-derived features. Moreover, KG-CNNDTI was employed for virtual screening of natural products against Alzheimer's disease, resulting in 40 candidate compounds. 5 were supported by literature evidence, among which 3 were further validated in vitro assays.
Alzheimer Disease/drug therapy*
;
Biological Products/therapeutic use*
;
Humans
;
Neural Networks, Computer
;
Machine Learning
;
Drug Discovery/methods*
;
Algorithms
;
Drug Evaluation, Preclinical/methods*
2.Predicting Diabetic Retinopathy Using a Machine Learning Approach Informed by Whole-Exome Sequencing Studies.
Chong Yang SHE ; Wen Ying FAN ; Yun Yun LI ; Yong TAO ; Zu Fei LI
Biomedical and Environmental Sciences 2025;38(1):67-78
OBJECTIVE:
To establish and validate a novel diabetic retinopathy (DR) risk-prediction model using a whole-exome sequencing (WES)-based machine learning (ML) method.
METHODS:
WES was performed to identify potential single nucleotide polymorphism (SNP) or mutation sites in a DR pedigree comprising 10 members. A prediction model was established and validated in a cohort of 420 type 2 diabetic patients based on both genetic and demographic features. The contribution of each feature was assessed using Shapley Additive explanation analysis. The efficacies of the models with and without SNP were compared.
RESULTS:
WES revealed that seven SNPs/mutations ( rs116911833 in TRIM7, 1997T>C in LRBA, 1643T>C in PRMT10, rs117858678 in C9orf152, rs201922794 in CLDN25, rs146694895 in SH3GLB2, and rs201407189 in FANCC) were associated with DR. Notably, the model including rs146694895 and rs201407189 achieved better performance in predicting DR (accuracy: 80.2%; sensitivity: 83.3%; specificity: 76.7%; area under the receiver operating characteristic curve [AUC]: 80.0%) than the model without these SNPs (accuracy: 79.4%; sensitivity: 80.3%; specificity: 78.3%; AUC: 79.3%).
CONCLUSION
Novel SNP sites associated with DR were identified in the DR pedigree. Inclusion of rs146694895 and rs201407189 significantly enhanced the performance of the ML-based DR prediction model.
Diabetic Retinopathy/diagnosis*
;
Humans
;
Machine Learning
;
Male
;
Female
;
Polymorphism, Single Nucleotide
;
Middle Aged
;
Exome Sequencing
;
Aged
;
Adult
;
Pedigree
;
Diabetes Mellitus, Type 2/complications*
;
Genetic Predisposition to Disease
;
Mutation
3.Predicting Postoperative Circulatory Complications in Older Patients: A Machine Learning Approach.
Xiao Yun HU ; Wei Xuan SHENG ; Kang YU ; Jie Tai DUO ; Peng Fei LIU ; Ya Wei LI ; Dong Xin WANG ; Hui Hui MIAO
Biomedical and Environmental Sciences 2025;38(3):328-340
OBJECTIVE:
This study examines utilizes the advantages of machine learning algorithms to discern key determinants in prognosticate postoperative circulatory complications (PCCs) for older patients.
METHODS:
This secondary analysis of data from a randomized controlled trial involved 1,720 elderly participants in five tertiary hospitals in Beijing, China. Participants aged 60-90 years undergoing major non-cardiac surgery under general anesthesia. The primary outcome metric of the study was the occurrence of PCCs, according to the European Society of Cardiology and the European Society of Anaesthesiology diagnostic criteria. The analysis metrics contained 67 candidate variables, including baseline characteristics, laboratory tests, and scale assessments.
RESULTS:
Our feature selection process identified key variables that significantly impact patient outcomes, including the duration of ICU stay, surgery, and anesthesia; APACHE-II score; intraoperative average heart rate and blood loss; cumulative opioid use during surgery; patient age; VAS-Move-Median score on the 1st to 3rd day; Charlson comorbidity score; volumes of intraoperative plasma, crystalloid, and colloid fluids; cumulative red blood cell transfusion during surgery; and endotracheal intubation duration. Notably, our Random Forest model demonstrated exceptional performance with an accuracy of 0.9872.
CONCLUSION
We have developed and validated an algorithm for predicting PCCs in elderly patients by identifying key risk factors.
Aged
;
Aged, 80 and over
;
Female
;
Humans
;
Male
;
Middle Aged
;
Cardiovascular Diseases/etiology*
;
Machine Learning
;
Postoperative Complications/etiology*
;
Risk Factors
;
Randomized Controlled Trials as Topic
;
Secondary Data Analysis
4.Analysis of Tongue and Face Image Features of Anemic Women and Construction of Risk-Screening Model.
Hong Yuan FU ; Yi CHUN ; Ya Han ZHANG ; Yu WANG ; Yu Lin SHI ; Tao JIANG ; Xiao Juan HU ; Li Ping TU ; Yong Zhi LI ; Jia Tuo XU
Biomedical and Environmental Sciences 2025;38(8):935-951
OBJECTIVE:
To identify the key features of facial and tongue images associated with anemia in female populations, establish anemia risk-screening models, and evaluate their performance.
METHODS:
A total of 533 female participants (anemic and healthy) were recruited from Shuguang Hospital. Facial and tongue images were collected using the TFDA-1 tongue and face diagnosis instrument. Color and texture features from various parts of facial and tongue images were extracted using Face Diagnosis Analysis System (FDAS) and Tongue Diagnosis Analysis System version 2.0 (TDAS v2.0). Least Absolute Shrinkage and Selection Operator (LASSO) regression was used for feature selection. Ten machine learning models and one deep learning model (ResNet50V2 + Conv1D) were developed and evaluated.
RESULTS:
Anemic women showed lower a-values, higher L- and b-values across all age groups. Texture features analysis showed that women aged 30-39 with anemia had higher angular second moment (ASM)and lower entropy (ENT) values in facial images, while those aged 40-49 had lower contrast (CON), ENT, and MEAN values in tongue images but higher ASM. Anemic women exhibited age-related trends similar to healthy women, with decreasing L-values and increasing a-, b-, and ASM-values. LASSO identified 19 key features from 62. Among classifiers, the Artificial Neural Network (ANN) model achieved the best performance [area under the curve (AUC): 0.849, accuracy: 0.781]. The ResNet50V2 model achieved comparable results [AUC: 0.846, accuracy: 0.818].
CONCLUSION
Differences in facial and tongue images suggest that color and texture features can serve as potential TCM phenotype and auxiliary diagnostic indicators for female anemia.
Humans
;
Female
;
Tongue/diagnostic imaging*
;
Adult
;
Anemia/diagnosis*
;
Middle Aged
;
Face/diagnostic imaging*
;
Young Adult
;
Machine Learning
5.Risk prediction of demoralization syndrome in patients with oral cancer.
Liyan MAO ; Xixi YANG ; Xiaoqin BI ; Min LIU ; Chongyang ZHAO ; Zuozhen WEN
West China Journal of Stomatology 2025;43(3):395-405
OBJECTIVES:
This study aimed to construct a risk prediction model for the occurrence of the demora-lization syndrome in patients with oral cancer and provide a scientific basis for the prevention of this syndrome in patients with oral cancer and the development of personalized care programs.
METHODS:
A total of 486 patients with oral cancer in West China Hospital of Stomatology of Sichuan University and Sun Yat-sen Memorial Hospital of Sun Yat-sen University from 2024 March to July were selected by convenience sampling. We integrated clinical data and evidence from previous studies to identify the key variables affecting the demoralization syndrome in patients with oral cancer. The 486 patients were divided into a training set and a validation set in an 8∶2 ratio. A clinical risk prediction model was established based on the individual data of 365 patients in the development cohort. Through least absolute shrinkage and selection operator (LASSO) regression, a moderate to severe risk prediction model of demoralization syndrome in oral cancer was constructed, and a clinical machine-learning nomogram was constructed. Bootstrap resampling was used for internal validation. The data of 121 patients in the validation cohort were externally validated.
RESULTS:
The incidence of the demoralization syndrome in patients with oral cancer was 405 cases (83.3%), of which 279 cases (57.4%) were mild, 176 cases (36.2%) were moderate, and 31 cases (6.4%) were severe. The core model, including patient education level, disease understanding, and MDASI-HN score, was used to predict the risk of outcome. Internal validation of the model yielded C statistic of 0.783 6 (95% CI: 0.78-0.87), beta of 0.843 4, and calibration intercept of -0.040 6. Through external validation, the validation set C statistic was 0.80 (95%CI: 0.71-0.87), beta was 0.80, and calibration intercept was -0.08.
CONCLUSIONS
Our risk prediction mo-del of the demoralization syndrome in patients with oral cancer performed robustly in validation cohorts of different nur-sing environments. The model has good correction and good discrimination and can be used as an evaluation and prediction item at admission.
Humans
;
Mouth Neoplasms/complications*
;
Male
;
Female
;
Nomograms
;
Middle Aged
;
Syndrome
;
Aged
;
Adult
;
Risk Factors
;
Risk Assessment
;
Machine Learning
6.Machine learning-based prediction model for caries in the first molars of 9-year-old children in Suzhou.
Lingzhi CHEN ; Xiaqin WANG ; Kaifei ZHU ; Kun REN ; Zhen WU
West China Journal of Stomatology 2025;43(6):871-880
OBJECTIVES:
This study aimed to use machine learning algorithms to build a prediction model of the first permanent molar caries of 9-year-old children in Suzhou and screen out risk factors.
METHODS:
Random stratified whole group sampling was applied to randomly select 9-year-old students from 38 primary schools in 14 townships and streets in Wuzhong District for oral examination and questionnaire survey. Multifactor Logistics regression was used to analyze the risk factors of tooth decay. The data set was randomly divided into training sets and verification sets according to 8∶2, and R 4.3.1 was used to build five machine learning algorithms: random forest, decision tree, extreme gradient boosting (XGBoost), Logistics regression, and lightweight gradient enhancement (LightGBM). The predictive effect of these five models was evaluated using the area under the characteristic curve (AUC). The marginal contribution of quantitative characteristics to the caries prediction model was determined through Shapley additive explanations (SHAP).
RESULTS:
This study included 7 225 samples that met the standard. The caries rate of the first permanent molar was 54.96%. Multifactor Logistic regression analysis showed that sweet drinks, dessert and candy, snack frequency, and snacks before going to bed after brushing teeth were correlated with the occurrence of first permanent molar caries (P<0.05). The AUC values of decision tree, Logistic regression, LightGBM, random forest, and XGBoost were 75.5%, 83.9%, 88.6%, 88.9%, and 90.1%, respectively. Compared with the variables after single heat coding, the SHAP value of high-frequency sweets (such as dessert candy ≥2 times a day, mother's sugary diet ≥2 times a day) and bad oral hygiene habits (such as frequent snacks before going to bed after brushing teeth and irregular brushing teeth) exhibited the highest positive.
CONCLUSIONS
XGBoost algorithm has a good prediction effect for first permanent molar caries in 9-year-old children. High-frequency sweet factors and bad oral hygiene habits have a strong positive impact on the risk of first permanent molar caries and are key drivers that can be used in the formulation of targeted interventions.
Humans
;
Dental Caries/epidemiology*
;
Child
;
Machine Learning
;
China/epidemiology*
;
Molar
;
Risk Factors
;
Female
;
Logistic Models
;
Male
;
Decision Trees
;
Algorithms
7.Artificial intelligence-assisted design, mining, and modification of CRISPR-Cas systems.
Yufeng MAO ; Guangyun CHU ; Qingling LIANG ; Ye LIU ; Yi YANG ; Xiaoping LIAO ; Meng WANG
Chinese Journal of Biotechnology 2025;41(3):949-967
With the rapid advancement of synthetic biology, CRISPR-Cas systems have emerged as a powerful tool for gene editing, demonstrating significant potential in various fields, including medicine, agriculture, and industrial biotechnology. This review comprehensively summarizes the significant progress in applying artificial intelligence (AI) technologies to the design, mining, and modification of CRISPR-Cas systems. AI technologies, especially machine learning, have revolutionized sgRNA design by analyzing high-throughput sequencing data, thereby improving the editing efficiency and predicting off-target effects with high accuracy. Furthermore, this paper explores the role of AI in sgRNA design and evaluation, highlighting its contributions to the annotation and mining of CRISPR arrays and Cas proteins, as well as its potential for modifying key proteins involved in gene editing. These advancements have not only improved the efficiency and precision of gene editing but also expanded the horizons of genome engineering, paving the way for intelligent and precise genome editing.
CRISPR-Cas Systems/genetics*
;
Artificial Intelligence
;
Gene Editing/methods*
;
RNA, Guide, CRISPR-Cas Systems/genetics*
;
Machine Learning
;
Humans
;
Genetic Engineering/methods*
;
Synthetic Biology
8.Intelligent mining, engineering, and de novo design of proteins.
Cui LIU ; Zhenkun SHI ; Hongwu MA ; Xiaoping LIAO
Chinese Journal of Biotechnology 2025;41(3):993-1010
Natural components serve the survival instincts of cells that are obtained through long-term evolution, while they often fail to meet the demands of engineered cells for efficiently performing biological functions in special industrial environments. Enzymes, as biological catalysts, play a key role in biosynthetic pathways, significantly enhancing the rate and selectivity of biochemical reactions. However, the catalytic efficiency, stability, substrate specificity, and tolerance of natural enzymes often fall short of industrial production requirements. Therefore, exploring and modifying enzymes to suit specific biomanufacturing processes has become crucial. In recent years, artificial intelligence (AI) has played an increasingly important role in the discovery, evaluation, engineering, and de novo design of proteins. AI can accelerate the discovery and optimization of proteins by analyzing large amounts of bioinformatics data and predicting protein functions and characteristics by machine learning and deep learning algorithms. Moreover, AI can assist researchers in designing new protein structures by simulating and predicting their performance under different conditions, providing guidance for protein design. This paper reviews the latest research advances in protein discovery, evaluation, engineering, and de novo design for biomanufacturing and explores the hot topics, challenges, and emerging technical methods in this field, aiming to provide guidance and inspiration for researchers in related fields.
Protein Engineering/methods*
;
Artificial Intelligence
;
Proteins/genetics*
;
Computational Biology
;
Machine Learning
;
Data Mining
;
Algorithms
;
Deep Learning
9.Intelligent design of transcription factor-based biosensors.
Chaoning LIANG ; La XIANG ; Shuangyan TANG
Chinese Journal of Biotechnology 2025;41(3):1011-1022
Transcription factor (TF)-based biosensors have been widely applied in metabolic engineering, synthetic biology, metabolites monitoring, etc. These biosensors are praised for the high orthogonality, modularity, and operability. However, most natural TFs with weak responses and low specificity still demand optimization for desired performance in applications. Herein, we comprehensively summarize the recent advances in the engineering and optimization of TF-based biosensors with the assistance of computational simulation and artificial intelligence. This review includes the regulatory protein engineering aided by protein structure prediction and ligand binding simulation and the regulatory protein responses predicted by a mathematical model obtained from machine learning of mutagenesis data. In comparison with conventional tools, computational simulation and artificial intelligence enable more accurate and rapid design and construction of biosensors. Thus, these technologies will greatly promote the development of novel biosensors for applications.
Biosensing Techniques/methods*
;
Transcription Factors/metabolism*
;
Artificial Intelligence
;
Protein Engineering/methods*
;
Computer Simulation
;
Synthetic Biology
;
Machine Learning
10.Machine learning-aided design of synthetic biological parts and circuits.
Chinese Journal of Biotechnology 2025;41(3):1023-1051
Synthetic biology is an emerging interdisciplinary field at the convergence of biology, engineering, and computer science. It employs a bottom-up approach to progressively design biological parts, devices, and circuits, aiming to create artificial biological systems not found in nature or to redesign existing biological systems for specific purposes. With the rapid development of the synthetic biology industry, there is an increasing demand for large complex genetic circuits. However, the traditional trial-and-error methods, heavily reliant on empirical knowledge, have limited efficiency and success rates of parts/circuits construction, thereby impeding the innovation and technology translation for synthetic biology. These limitations have prompted a paradigm shift from labor-intensive, experience-driven trial-and-error models towards standardized, intelligent engineering approaches. Machine learning, capable of uncovering hidden structures and relationships within biological data, offers robust support for the intelligent design of synthetic biological parts and genetic circuits. Here, we review commonly used machine learning algorithms and analyze their typical applications in designing biological parts (e.g., synthetic promoters, RNA regulatory elements, and transcription factors) and simple genetic circuits. Additionally, we discuss the primary challenges in machine learning-aided design and propose potential solutions. Lastly, we envision the future trend of integrating machine learning with synthetic biological system design, highlighting the importance of interdisciplinary collaboration.
Synthetic Biology/methods*
;
Machine Learning
;
Gene Regulatory Networks
;
Algorithms

Result Analysis
Print
Save
E-mail