1.KG-CNNDTI: a knowledge graph-enhanced prediction model for drug-target interactions and application in virtual screening of natural products against Alzheimer's disease.
Chengyuan YUE ; Baiyu CHEN ; Long CHEN ; Le XIONG ; Changda GONG ; Ze WANG ; Guixia LIU ; Weihua LI ; Rui WANG ; Yun TANG
Chinese Journal of Natural Medicines (English Ed.) 2025;23(11):1283-1292
Accurate prediction of drug-target interactions (DTIs) plays a pivotal role in drug discovery, facilitating optimization of lead compounds, drug repurposing and elucidation of drug side effects. However, traditional DTI prediction methods are often limited by incomplete biological data and insufficient representation of protein features. In this study, we proposed KG-CNNDTI, a novel knowledge graph-enhanced framework for DTI prediction, which integrates heterogeneous biological information to improve model generalizability and predictive performance. The proposed model utilized protein embeddings derived from a biomedical knowledge graph via the Node2Vec algorithm, which were further enriched with contextualized sequence representations obtained from ProteinBERT. For compound representation, multiple molecular fingerprint schemes alongside the Uni-Mol pre-trained model were evaluated. The fused representations served as inputs to both classical machine learning models and a convolutional neural network-based predictor. Experimental evaluations across benchmark datasets demonstrated that KG-CNNDTI achieved superior performance compared to state-of-the-art methods, particularly in terms of Precision, Recall, F1-Score and area under the precision-recall curve (AUPR). Ablation analysis highlighted the substantial contribution of knowledge graph-derived features. Moreover, KG-CNNDTI was employed for virtual screening of natural products against Alzheimer's disease, resulting in 40 candidate compounds. 5 were supported by literature evidence, among which 3 were further validated in vitro assays.
Alzheimer Disease/drug therapy*
;
Biological Products/therapeutic use*
;
Humans
;
Neural Networks, Computer
;
Machine Learning
;
Drug Discovery/methods*
;
Algorithms
;
Drug Evaluation, Preclinical/methods*
2.Early warning model of postoperative infection of internal fixation device in maxillofacial fracture based on the synthetic minority over-sampling technique algorithm.
Jinfeng JIANG ; Haiyan WANG ; Yanfeng SHI ; Ke XU
West China Journal of Stomatology 2025;43(6):837-844
OBJECTIVES:
This study investigates independent risk factors for postoperative internal fixation device infection in patients with maxillofacial fractures and proposes an early warning model based on the synthetic minority over-sampling technique (SMOTE) algorithm.
METHODS:
A total of 1 104 patients who underwent surgical treatment for maxillofacial fractures at Oral and Maxillofacial Surgery Department, Affiliated Hospital of Nantong University from January 2021 to December 2024 were retrospectively analyzed. The patients were divided into two groups based on the presence of postoperative internal fixation device infection: the infection group (27 cases) and non-infection group (1 077 cases). Clinical data from both groups were collected and subjected to statistical analysis. Univariate and binary Logistic regression analysis were used to identify risk factors for postoperative internal fixation device infection in maxillofacial fractures. Subsequently, a Logistic regression model was established, and the dataset was improved based on the SMOTE algorithm to construct an early warning model with the improved dataset. The prediction performance of the models was compared and validated.
RESULTS:
Among the 1 104 patients who underwent surgical treatment for maxillofacial fractures, 27 cases of postoperative internal fixation device infections were identified, corresponding to an infection rate of 2.45% (27/1 104). Age, diabetes history, fracture severity, and oral hygiene status were all identified as risk factors for postoperative internal fixation device infections in maxillofacial fractures (all P<0.05). The prediction model based on the original data (P1). The prediction model based on the SMOTE algorithm (P2). Receiver operating characteristic (ROC) curve analysis shows that the area under curve (AUC) for the P2 model was 0.882, the P1 model was 0.861, indicating the superior predictive performance of the P2 model. The DeLong test results show that the difference in AUC between the two models was statistically significant (P<0.05).
CONCLUSIONS
Age, diabetes history, postoperative fracture severity, and oral hygiene status are all risk factors for infections associated with internal fixation devices after maxillofacial fracture surgery. The proposed early warning model demonstrated good predictive performance. Medical professionals can utilize this model to effectively intervene and anticipate infections related to internal fixation devices after maxillofacial fracture surgery.
Humans
;
Algorithms
;
Retrospective Studies
;
Male
;
Female
;
Fracture Fixation, Internal/instrumentation*
;
Risk Factors
;
Middle Aged
;
Adult
;
Logistic Models
;
Surgical Wound Infection/epidemiology*
;
Aged
;
Internal Fixators/adverse effects*
;
Maxillofacial Injuries/surgery*
;
Adolescent
3.Machine learning-based prediction model for caries in the first molars of 9-year-old children in Suzhou.
Lingzhi CHEN ; Xiaqin WANG ; Kaifei ZHU ; Kun REN ; Zhen WU
West China Journal of Stomatology 2025;43(6):871-880
OBJECTIVES:
This study aimed to use machine learning algorithms to build a prediction model of the first permanent molar caries of 9-year-old children in Suzhou and screen out risk factors.
METHODS:
Random stratified whole group sampling was applied to randomly select 9-year-old students from 38 primary schools in 14 townships and streets in Wuzhong District for oral examination and questionnaire survey. Multifactor Logistics regression was used to analyze the risk factors of tooth decay. The data set was randomly divided into training sets and verification sets according to 8∶2, and R 4.3.1 was used to build five machine learning algorithms: random forest, decision tree, extreme gradient boosting (XGBoost), Logistics regression, and lightweight gradient enhancement (LightGBM). The predictive effect of these five models was evaluated using the area under the characteristic curve (AUC). The marginal contribution of quantitative characteristics to the caries prediction model was determined through Shapley additive explanations (SHAP).
RESULTS:
This study included 7 225 samples that met the standard. The caries rate of the first permanent molar was 54.96%. Multifactor Logistic regression analysis showed that sweet drinks, dessert and candy, snack frequency, and snacks before going to bed after brushing teeth were correlated with the occurrence of first permanent molar caries (P<0.05). The AUC values of decision tree, Logistic regression, LightGBM, random forest, and XGBoost were 75.5%, 83.9%, 88.6%, 88.9%, and 90.1%, respectively. Compared with the variables after single heat coding, the SHAP value of high-frequency sweets (such as dessert candy ≥2 times a day, mother's sugary diet ≥2 times a day) and bad oral hygiene habits (such as frequent snacks before going to bed after brushing teeth and irregular brushing teeth) exhibited the highest positive.
CONCLUSIONS
XGBoost algorithm has a good prediction effect for first permanent molar caries in 9-year-old children. High-frequency sweet factors and bad oral hygiene habits have a strong positive impact on the risk of first permanent molar caries and are key drivers that can be used in the formulation of targeted interventions.
Humans
;
Dental Caries/epidemiology*
;
Child
;
Machine Learning
;
China/epidemiology*
;
Molar
;
Risk Factors
;
Female
;
Logistic Models
;
Male
;
Decision Trees
;
Algorithms
4.Intelligent design of nucleic acid elements in biomanufacturing.
Jinsheng WANG ; Zhe SUN ; Xueli ZHANG
Chinese Journal of Biotechnology 2025;41(3):968-992
Nucleic acid elements are essential functional sequences that play critical roles in regulating gene expression, optimizing pathways, and enabling gene editing to enhance the production of target products in biomanufacturing. Therefore, the design and optimization of these elements are crucial in constructing efficient cell factories. Artificial intelligence (AI) provides robust support for biomanufacturing by accurately predicting functional nucleic acid elements, designing and optimizing sequences with quantified functions, and elucidating the operating mechanisms of these elements. In recent years, AI has significantly accelerated the progress in biomanufacturing by reducing experimental workloads through the design and optimization of promoters, ribosome-binding sites, terminators, and their combinations. Despite these advancements, the application of AI in biomanufacturing remains limited due to the complexity of biological systems and the lack of highly quantified training data. This review summarizes the various nucleic acid elements utilized in biomanufacturing, the tools developed for predicting and designing these elements based on AI algorithms, and the case studies showcasing the applications of AI in biomanufacturing. By integrating AI with synthetic biology and high-throughput techniques, we anticipate the development of more efficient tools for designing nucleic acid elements and accelerating the application of AI in biomanufacturing.
Artificial Intelligence
;
Synthetic Biology
;
Nucleic Acids/genetics*
;
Algorithms
;
Gene Editing
;
Promoter Regions, Genetic
;
Biotechnology/methods*
5.Intelligent mining, engineering, and de novo design of proteins.
Cui LIU ; Zhenkun SHI ; Hongwu MA ; Xiaoping LIAO
Chinese Journal of Biotechnology 2025;41(3):993-1010
Natural components serve the survival instincts of cells that are obtained through long-term evolution, while they often fail to meet the demands of engineered cells for efficiently performing biological functions in special industrial environments. Enzymes, as biological catalysts, play a key role in biosynthetic pathways, significantly enhancing the rate and selectivity of biochemical reactions. However, the catalytic efficiency, stability, substrate specificity, and tolerance of natural enzymes often fall short of industrial production requirements. Therefore, exploring and modifying enzymes to suit specific biomanufacturing processes has become crucial. In recent years, artificial intelligence (AI) has played an increasingly important role in the discovery, evaluation, engineering, and de novo design of proteins. AI can accelerate the discovery and optimization of proteins by analyzing large amounts of bioinformatics data and predicting protein functions and characteristics by machine learning and deep learning algorithms. Moreover, AI can assist researchers in designing new protein structures by simulating and predicting their performance under different conditions, providing guidance for protein design. This paper reviews the latest research advances in protein discovery, evaluation, engineering, and de novo design for biomanufacturing and explores the hot topics, challenges, and emerging technical methods in this field, aiming to provide guidance and inspiration for researchers in related fields.
Protein Engineering/methods*
;
Artificial Intelligence
;
Proteins/genetics*
;
Computational Biology
;
Machine Learning
;
Data Mining
;
Algorithms
;
Deep Learning
6.Machine learning-aided design of synthetic biological parts and circuits.
Chinese Journal of Biotechnology 2025;41(3):1023-1051
Synthetic biology is an emerging interdisciplinary field at the convergence of biology, engineering, and computer science. It employs a bottom-up approach to progressively design biological parts, devices, and circuits, aiming to create artificial biological systems not found in nature or to redesign existing biological systems for specific purposes. With the rapid development of the synthetic biology industry, there is an increasing demand for large complex genetic circuits. However, the traditional trial-and-error methods, heavily reliant on empirical knowledge, have limited efficiency and success rates of parts/circuits construction, thereby impeding the innovation and technology translation for synthetic biology. These limitations have prompted a paradigm shift from labor-intensive, experience-driven trial-and-error models towards standardized, intelligent engineering approaches. Machine learning, capable of uncovering hidden structures and relationships within biological data, offers robust support for the intelligent design of synthetic biological parts and genetic circuits. Here, we review commonly used machine learning algorithms and analyze their typical applications in designing biological parts (e.g., synthetic promoters, RNA regulatory elements, and transcription factors) and simple genetic circuits. Additionally, we discuss the primary challenges in machine learning-aided design and propose potential solutions. Lastly, we envision the future trend of integrating machine learning with synthetic biological system design, highlighting the importance of interdisciplinary collaboration.
Synthetic Biology/methods*
;
Machine Learning
;
Gene Regulatory Networks
;
Algorithms
7.Research progress in mechanism models and artificial intelligence models for protein expression systems.
Yi YANG ; Jun DU ; Chunhe YANG ; Hongwu MA
Chinese Journal of Biotechnology 2025;41(3):1079-1097
Proteins are the basic building blocks of life. Studying the protein expression mechanism is essential for understanding the cellular organization principles and the development of biotechnology. Protein expression, involving transcription, translation, folding, and post-translational modification, is a complicatedly regulated process affected by various cellular components and sequence features of the expressed protein. Establishing protein expression models based on expression data is of great significance for probing into the regulatory factors and mechanisms of protein expression. Here we review the recent research progress in the mechanism models for quantitatively simulating the protein expression process and the prediction algorithms based on artificial intelligence for analyzing the regulatory factors. Chemical reaction network models have been developed to mathematically describe the elementary processes in protein expression and simulate the influences of various cellular components such as RNA polymerase and tRNA. However, the experimental determination of the huge number of model parameters is a big challenge. The main objective of data-driven AI models is to study the effects of protein/DNA sequences of the target protein on its expression, and subsequently optimize the sequences to improve protein expression. Methods combining mechanism models and AI models have the potential to deepen our understanding of protein expression processes, providing theoretical and technical support for the efficient production of high-value proteins and coordinate the regulation of different proteins.
Artificial Intelligence
;
Proteins/metabolism*
;
Algorithms
;
Protein Biosynthesis
8.Optimization of fermentation processes in intelligent biomanufacturing: on online monitoring, artificial intelligence, and digital twin technologies.
Jianye XIA ; Dongjiao LONG ; Min CHEN ; Anxiang CHEN
Chinese Journal of Biotechnology 2025;41(3):1179-1196
As a strategic emerging industry, biomanufacturing faces core challenges in achieving precise optimization and efficient scale-up of fermentation processes. This review focuses on two critical aspects of fermentation-real-time sensing and intelligent control-and systematically summarizes the advancements in online monitoring technologies, artificial intelligence (AI)-driven optimization strategies, and digital twin applications. First, online monitoring technologies, ranging from conventional parameters (e.g., temperature, pH, and dissolved oxygen) to advanced sensing systems (e.g., online viable cell sensors, spectroscopy, and exhaust gas analysis), provide a data foundation for real-time microbial metabolic state characterization. Second, conventional static control relying on expert experience is evolving toward AI-driven dynamic optimization. The integration of machine learning technologies (e.g., artificial neural networks and support vector machines) and genetic algorithms significantly enhances the regulation efficiency of feeding strategies and process parameters. Finally, digital twin technology, integrating real-time sensing data with multi-scale models (e.g., cellular metabolic kinetics and reactor hydrodynamics), offers a novel paradigm for lifecycle optimization and rational scale-up of fermentation. Future advancements in closed-loop control systems based on intelligent sensing and digital twin are expected to accelerate the industrialization of innovative achievements in synthetic biology and drive biomanufacturing toward higher efficiency, intelligence, and sustainability.
Artificial Intelligence
;
Fermentation
;
Bioreactors/microbiology*
;
Neural Networks, Computer
;
Algorithms
;
Biotechnology/methods*
9.pLM4ACP: a model for predicting anticancer peptides based on machine learning and protein language models.
Yitong LIU ; Wenxin CHEN ; Juanjuan LI ; Xue CHI ; Xiang MA ; Yanqiong TANG ; Hong LI
Chinese Journal of Biotechnology 2025;41(8):3252-3261
Cancer is a serious global health problem and a major cause of human death. Conventional cancer treatments often run the risk of impairing vital organ functions. Anticancer peptides (ACPs) are considered to be one of the most promising therapeutic agents against common human cancers due to their small sizes, high specificity, and low toxicity. Since ACP recognition is highly limited to the laboratory, expensive, and time-consuming, we proposed pLM4ACP, a model for predicting ACPs based on machine learning and protein language models. In this model, the protein language model ProtT5 was used to extract the features of ACPs, and the extracted features were input into the support vector machine (SVM) classification algorithm for optimization and performance evaluation. The model showcased significantly higher accuracy than other methods, with the overall accuracy of 0.763, F1-score of 0.767, Matthews correlation coefficient of 0.527, and area under the curve of 0.827 on the independent test set. This study constructs an efficient anticancer peptide prediction model based on protein language models, further advancing the application of artificial intelligence in the biomedical field and promoting the development of precision medicine and computational biology.
Machine Learning
;
Antineoplastic Agents/chemistry*
;
Humans
;
Peptides/chemistry*
;
Support Vector Machine
;
Algorithms
;
Computational Biology/methods*
;
Neoplasms/drug therapy*
10.A high-throughput plant canopy leaf area index inversion model based on UAV-LiDAR.
Yuming LIANG ; Xueyan FAN ; Muqing ZHANG ; Wei YAO ; Xiuhua LI ; Zeping WANG ; Sifan DONG ; Xuechen LI
Chinese Journal of Biotechnology 2025;41(10):3817-3827
To explore the feasibility of using UAV-LiDAR for measuring the leaf area index (LAI) of crop canopies, we employed UAV-LiDAR to scan sugarcane canopies during the tillering and elongation stages, acquiring canopy point cloud data. Subsequently, features such as average row height, projected row area, point cloud density at different canopy layers, and the ratios between these parameters were extracted. Three feature selection methods-partial least squares regression (PLSR), XGBoost feature importance (XGBoost-FI), and random forest-recursive feature elimination (RF-RFE)-were adopted to evaluate and identify the optimal input variables for modeling. With these selected variables, LAI inversion models were developed based on random forest (RF) and adaptive boosting (AdaBoost) algorithms, and their performance was assessed. Among the extracted features, the projected row area Sp and the total row point count Ctotal exhibited strong correlations with LAI, with correlation coefficients of 0.73 and 0.72, respectively. The AdaBoost-based LAI inversion model, using the projected row area Sp, average height Havg, mid-layer point cloud density Cm, and total row point count Ctotal as input variables, achieved the best performance, with a coefficient of determination (Rv²) of 0.713 and a root mean square error (RMSEv) of 0.25 on the validation set. This study provides an effective method for high-throughput acquisition of LAI in field crops, offering valuable scientific support for sugarcane field management and breeding efforts.
Plant Leaves/growth & development*
;
Saccharum/growth & development*
;
Algorithms
;
Unmanned Aerial Devices
;
Remote Sensing Technology/methods*
;
Crops, Agricultural/growth & development*

Result Analysis
Print
Save
E-mail