1.Risk prediction of demoralization syndrome in patients with oral cancer.
Liyan MAO ; Xixi YANG ; Xiaoqin BI ; Min LIU ; Chongyang ZHAO ; Zuozhen WEN
West China Journal of Stomatology 2025;43(3):395-405
OBJECTIVES:
This study aimed to construct a risk prediction model for the occurrence of the demora-lization syndrome in patients with oral cancer and provide a scientific basis for the prevention of this syndrome in patients with oral cancer and the development of personalized care programs.
METHODS:
A total of 486 patients with oral cancer in West China Hospital of Stomatology of Sichuan University and Sun Yat-sen Memorial Hospital of Sun Yat-sen University from 2024 March to July were selected by convenience sampling. We integrated clinical data and evidence from previous studies to identify the key variables affecting the demoralization syndrome in patients with oral cancer. The 486 patients were divided into a training set and a validation set in an 8∶2 ratio. A clinical risk prediction model was established based on the individual data of 365 patients in the development cohort. Through least absolute shrinkage and selection operator (LASSO) regression, a moderate to severe risk prediction model of demoralization syndrome in oral cancer was constructed, and a clinical machine-learning nomogram was constructed. Bootstrap resampling was used for internal validation. The data of 121 patients in the validation cohort were externally validated.
RESULTS:
The incidence of the demoralization syndrome in patients with oral cancer was 405 cases (83.3%), of which 279 cases (57.4%) were mild, 176 cases (36.2%) were moderate, and 31 cases (6.4%) were severe. The core model, including patient education level, disease understanding, and MDASI-HN score, was used to predict the risk of outcome. Internal validation of the model yielded C statistic of 0.783 6 (95% CI: 0.78-0.87), beta of 0.843 4, and calibration intercept of -0.040 6. Through external validation, the validation set C statistic was 0.80 (95%CI: 0.71-0.87), beta was 0.80, and calibration intercept was -0.08.
CONCLUSIONS
Our risk prediction mo-del of the demoralization syndrome in patients with oral cancer performed robustly in validation cohorts of different nur-sing environments. The model has good correction and good discrimination and can be used as an evaluation and prediction item at admission.
Humans
;
Mouth Neoplasms/complications*
;
Male
;
Female
;
Nomograms
;
Middle Aged
;
Syndrome
;
Aged
;
Adult
;
Risk Factors
;
Risk Assessment
;
Machine Learning
2.Machine learning-based prediction model for caries in the first molars of 9-year-old children in Suzhou.
Lingzhi CHEN ; Xiaqin WANG ; Kaifei ZHU ; Kun REN ; Zhen WU
West China Journal of Stomatology 2025;43(6):871-880
OBJECTIVES:
This study aimed to use machine learning algorithms to build a prediction model of the first permanent molar caries of 9-year-old children in Suzhou and screen out risk factors.
METHODS:
Random stratified whole group sampling was applied to randomly select 9-year-old students from 38 primary schools in 14 townships and streets in Wuzhong District for oral examination and questionnaire survey. Multifactor Logistics regression was used to analyze the risk factors of tooth decay. The data set was randomly divided into training sets and verification sets according to 8∶2, and R 4.3.1 was used to build five machine learning algorithms: random forest, decision tree, extreme gradient boosting (XGBoost), Logistics regression, and lightweight gradient enhancement (LightGBM). The predictive effect of these five models was evaluated using the area under the characteristic curve (AUC). The marginal contribution of quantitative characteristics to the caries prediction model was determined through Shapley additive explanations (SHAP).
RESULTS:
This study included 7 225 samples that met the standard. The caries rate of the first permanent molar was 54.96%. Multifactor Logistic regression analysis showed that sweet drinks, dessert and candy, snack frequency, and snacks before going to bed after brushing teeth were correlated with the occurrence of first permanent molar caries (P<0.05). The AUC values of decision tree, Logistic regression, LightGBM, random forest, and XGBoost were 75.5%, 83.9%, 88.6%, 88.9%, and 90.1%, respectively. Compared with the variables after single heat coding, the SHAP value of high-frequency sweets (such as dessert candy ≥2 times a day, mother's sugary diet ≥2 times a day) and bad oral hygiene habits (such as frequent snacks before going to bed after brushing teeth and irregular brushing teeth) exhibited the highest positive.
CONCLUSIONS
XGBoost algorithm has a good prediction effect for first permanent molar caries in 9-year-old children. High-frequency sweet factors and bad oral hygiene habits have a strong positive impact on the risk of first permanent molar caries and are key drivers that can be used in the formulation of targeted interventions.
Humans
;
Dental Caries/epidemiology*
;
Child
;
Machine Learning
;
China/epidemiology*
;
Molar
;
Risk Factors
;
Female
;
Logistic Models
;
Male
;
Decision Trees
;
Algorithms
3.Artificial intelligence-assisted design, mining, and modification of CRISPR-Cas systems.
Yufeng MAO ; Guangyun CHU ; Qingling LIANG ; Ye LIU ; Yi YANG ; Xiaoping LIAO ; Meng WANG
Chinese Journal of Biotechnology 2025;41(3):949-967
With the rapid advancement of synthetic biology, CRISPR-Cas systems have emerged as a powerful tool for gene editing, demonstrating significant potential in various fields, including medicine, agriculture, and industrial biotechnology. This review comprehensively summarizes the significant progress in applying artificial intelligence (AI) technologies to the design, mining, and modification of CRISPR-Cas systems. AI technologies, especially machine learning, have revolutionized sgRNA design by analyzing high-throughput sequencing data, thereby improving the editing efficiency and predicting off-target effects with high accuracy. Furthermore, this paper explores the role of AI in sgRNA design and evaluation, highlighting its contributions to the annotation and mining of CRISPR arrays and Cas proteins, as well as its potential for modifying key proteins involved in gene editing. These advancements have not only improved the efficiency and precision of gene editing but also expanded the horizons of genome engineering, paving the way for intelligent and precise genome editing.
CRISPR-Cas Systems/genetics*
;
Artificial Intelligence
;
Gene Editing/methods*
;
RNA, Guide, CRISPR-Cas Systems/genetics*
;
Machine Learning
;
Humans
;
Genetic Engineering/methods*
;
Synthetic Biology
4.Intelligent mining, engineering, and de novo design of proteins.
Cui LIU ; Zhenkun SHI ; Hongwu MA ; Xiaoping LIAO
Chinese Journal of Biotechnology 2025;41(3):993-1010
Natural components serve the survival instincts of cells that are obtained through long-term evolution, while they often fail to meet the demands of engineered cells for efficiently performing biological functions in special industrial environments. Enzymes, as biological catalysts, play a key role in biosynthetic pathways, significantly enhancing the rate and selectivity of biochemical reactions. However, the catalytic efficiency, stability, substrate specificity, and tolerance of natural enzymes often fall short of industrial production requirements. Therefore, exploring and modifying enzymes to suit specific biomanufacturing processes has become crucial. In recent years, artificial intelligence (AI) has played an increasingly important role in the discovery, evaluation, engineering, and de novo design of proteins. AI can accelerate the discovery and optimization of proteins by analyzing large amounts of bioinformatics data and predicting protein functions and characteristics by machine learning and deep learning algorithms. Moreover, AI can assist researchers in designing new protein structures by simulating and predicting their performance under different conditions, providing guidance for protein design. This paper reviews the latest research advances in protein discovery, evaluation, engineering, and de novo design for biomanufacturing and explores the hot topics, challenges, and emerging technical methods in this field, aiming to provide guidance and inspiration for researchers in related fields.
Protein Engineering/methods*
;
Artificial Intelligence
;
Proteins/genetics*
;
Computational Biology
;
Machine Learning
;
Data Mining
;
Algorithms
;
Deep Learning
5.Intelligent design of transcription factor-based biosensors.
Chaoning LIANG ; La XIANG ; Shuangyan TANG
Chinese Journal of Biotechnology 2025;41(3):1011-1022
Transcription factor (TF)-based biosensors have been widely applied in metabolic engineering, synthetic biology, metabolites monitoring, etc. These biosensors are praised for the high orthogonality, modularity, and operability. However, most natural TFs with weak responses and low specificity still demand optimization for desired performance in applications. Herein, we comprehensively summarize the recent advances in the engineering and optimization of TF-based biosensors with the assistance of computational simulation and artificial intelligence. This review includes the regulatory protein engineering aided by protein structure prediction and ligand binding simulation and the regulatory protein responses predicted by a mathematical model obtained from machine learning of mutagenesis data. In comparison with conventional tools, computational simulation and artificial intelligence enable more accurate and rapid design and construction of biosensors. Thus, these technologies will greatly promote the development of novel biosensors for applications.
Biosensing Techniques/methods*
;
Transcription Factors/metabolism*
;
Artificial Intelligence
;
Protein Engineering/methods*
;
Computer Simulation
;
Synthetic Biology
;
Machine Learning
6.Machine learning-aided design of synthetic biological parts and circuits.
Chinese Journal of Biotechnology 2025;41(3):1023-1051
Synthetic biology is an emerging interdisciplinary field at the convergence of biology, engineering, and computer science. It employs a bottom-up approach to progressively design biological parts, devices, and circuits, aiming to create artificial biological systems not found in nature or to redesign existing biological systems for specific purposes. With the rapid development of the synthetic biology industry, there is an increasing demand for large complex genetic circuits. However, the traditional trial-and-error methods, heavily reliant on empirical knowledge, have limited efficiency and success rates of parts/circuits construction, thereby impeding the innovation and technology translation for synthetic biology. These limitations have prompted a paradigm shift from labor-intensive, experience-driven trial-and-error models towards standardized, intelligent engineering approaches. Machine learning, capable of uncovering hidden structures and relationships within biological data, offers robust support for the intelligent design of synthetic biological parts and genetic circuits. Here, we review commonly used machine learning algorithms and analyze their typical applications in designing biological parts (e.g., synthetic promoters, RNA regulatory elements, and transcription factors) and simple genetic circuits. Additionally, we discuss the primary challenges in machine learning-aided design and propose potential solutions. Lastly, we envision the future trend of integrating machine learning with synthetic biological system design, highlighting the importance of interdisciplinary collaboration.
Synthetic Biology/methods*
;
Machine Learning
;
Gene Regulatory Networks
;
Algorithms
7.Serum proteomics and machine learning unveil new diagnostic biomarkers for tuberculosis in adolescents and young adults.
Yu CHEN ; Hongxiang XU ; Yao TIAN ; Qian HE ; Xiaoyun ZHAO ; Guobin ZHANG ; Jianping XIE
Chinese Journal of Biotechnology 2025;41(4):1478-1489
Adolescents and young adults (AYAs) are one of the major populations susceptible to tuberculosis. However, little is known about the unique characteristics and diagnostic biomarkers of tuberculosis in this population. In this study, 81 AYAs were recruited, and the high-quality serum proteome of the AYAs with tuberculosis was profiled by quantitative proteomics. The data of serum proteomics indicated that the relative abundance of hemoglobin and apolipoprotein was significantly reduced in the patients with active tuberculosis (ATB). The pathway enrichment analysis showed that the downregulated proteins in the ATB group were mainly involved in the antioxidant and cell detoxification pathways, indicating extensive oxidative stress damage. Random forest (RF) and extreme gradient boosting (XGBoost) were employed to evaluate protein importance, which yielded a set of candidate proteins that can distinguish between ATB and non-ATB. The analysis with the support vector machine algorithm (recursive feature elimination) suggested that the combination of apolipoprotein A-I (APOA1), hemoglobin subunit beta (HBB), and hemoglobin subunit alpha-1 (HBA1) had the highest accuracy and sensitivity in diagnosing ATB. Meanwhile, the levels of hemoglobin (HGB) and albumin (ALB) can be used as blood biochemical indicators to evaluate changes in the protein levels of APOA1 and HBB. This study established the serum proteome landscape of AYAs with tuberculosis and identified new biomarkers for the diagnosis of tuberculosis in this population.
Humans
;
Proteomics/methods*
;
Biomarkers/blood*
;
Adolescent
;
Young Adult
;
Apolipoprotein A-I/blood*
;
Machine Learning
;
Tuberculosis/blood*
;
Proteome/analysis*
;
Male
;
Hemoglobins/analysis*
;
Female
;
Blood Proteins/analysis*
;
Adult
8.pLM4ACP: a model for predicting anticancer peptides based on machine learning and protein language models.
Yitong LIU ; Wenxin CHEN ; Juanjuan LI ; Xue CHI ; Xiang MA ; Yanqiong TANG ; Hong LI
Chinese Journal of Biotechnology 2025;41(8):3252-3261
Cancer is a serious global health problem and a major cause of human death. Conventional cancer treatments often run the risk of impairing vital organ functions. Anticancer peptides (ACPs) are considered to be one of the most promising therapeutic agents against common human cancers due to their small sizes, high specificity, and low toxicity. Since ACP recognition is highly limited to the laboratory, expensive, and time-consuming, we proposed pLM4ACP, a model for predicting ACPs based on machine learning and protein language models. In this model, the protein language model ProtT5 was used to extract the features of ACPs, and the extracted features were input into the support vector machine (SVM) classification algorithm for optimization and performance evaluation. The model showcased significantly higher accuracy than other methods, with the overall accuracy of 0.763, F1-score of 0.767, Matthews correlation coefficient of 0.527, and area under the curve of 0.827 on the independent test set. This study constructs an efficient anticancer peptide prediction model based on protein language models, further advancing the application of artificial intelligence in the biomedical field and promoting the development of precision medicine and computational biology.
Machine Learning
;
Antineoplastic Agents/chemistry*
;
Humans
;
Peptides/chemistry*
;
Support Vector Machine
;
Algorithms
;
Computational Biology/methods*
;
Neoplasms/drug therapy*
9.Identification of high-risk preoperative blood indicators and baseline characteristics for multiple postoperative complications in rheumatoid arthritis patients undergoing total knee arthroplasty: a multi-machine learning feature contribution analysis.
Kejia ZHU ; Zhiyang HUANG ; Biao WANG ; Hang LI ; Yuangang WU ; Bin SHEN ; Yong NIE
Chinese Journal of Reparative and Reconstructive Surgery 2025;39(12):1532-1542
OBJECTIVE:
To explore, identify, and develop novel blood-based indicators using machine learning algorithms for accurate preoperative assessment and effective prediction of postoperative complication risks in patients with rheumatoid arthritis (RA) undergoing total knee arthroplasty (TKA).
METHODS:
A retrospective cohort study was conducted including RA patients who underwent unilateral TKA between January 2019 and December 2024. Inpatient and 30-day postoperative outpatient follow-up data were collected. Six machine learning algorithms, including decision tree, random forest, logistic regression, support vector machine, extreme gradient boosting, and light gradient boosting machine, were used to construct predictive models. Model performance was evaluated using the area under the receiver operating characteristic curve (AUC), F1-score, accuracy, precision, and recall. SHapley Additive exPlanations (SHAP) values were employed to interpret and rank the importance of individual variables.
RESULTS:
According to the inclusion criteria, a total of 1 548 patients were enrolled. Ultimately, 18 preoperative indicators were identified as effective predictive features, and 8 postoperative complications were defined as prediction labels for inclusion in the study. Within 30 days after surgery, 453 patients (29.2%) developed one or more complications. Considering overall accuracy, precision, recall, and F1-score, the random forest model [AUC=0.930, 95% CI (0.910, 0.950)] and the extreme gradient boosting model [AUC=0.909, 95% CI (0.880, 0.938)] demonstrated the best predictive performance. SHAP analysis revealed that anti-cyclic citrullinated peptide antibody, C-reactive protein, rheumatoid factor, interleukin-6, body mass index, age, and smoking status made significant contributions to the overall prediction of postoperative complications.
CONCLUSION
Machine learning-based models enable accurate prediction of postoperative complication risks among RA patients undergoing TKA. Inflammatory and immune-related blood biomarkers, such as anti-cyclic citrullinated peptide antibody, C-reactive protein, and rheumatoid factor, interleukin-6, play key predictive roles, highlighting their potential value in perioperative risk stratification and individualized management.
Humans
;
Arthroplasty, Replacement, Knee/adverse effects*
;
Arthritis, Rheumatoid/blood*
;
Machine Learning
;
Postoperative Complications/blood*
;
Female
;
Male
;
Retrospective Studies
;
Middle Aged
;
Aged
;
Risk Factors
;
Preoperative Period
;
C-Reactive Protein/analysis*
;
Risk Assessment
10.Machine learning models established to distinguish OA and RA based on immune factors in the knee joint fluid.
Qin LIANG ; Lingzhi ZHAO ; Yan LU ; Rui ZHANG ; Qiaolin YANG ; Hui FU ; Haiping LIU ; Lei ZHANG ; Guoduo LI
Chinese Journal of Cellular and Molecular Immunology 2025;41(4):331-338
Objective Based on 25 indicators including immune factors, cell count classification, and smear results of the knee joint fluid, machine learning models were established to distinguish between osteoarthritis (OA) and rheumatoid arthritis (RA). Methods 100 OA and 40 RA patients scheduled for total knee arthroplasty were enrolled respectively. Each patient's knee joint fluid was collected preoperatively. Nucleated cells were counted and classified. The expression levels of immune factors, including tumor necrosis factor alpha (TNF-α), interleukin-1 beta (IL-1β), IL-6, IL-8, IL-15, matrix metalloproteinase 3 (MMP3), MMP9, MMP13, rheumatoid factor (RF), serum amyloid A (SAA), C-reactive protein (CRP), and others were measured. Smears and microscopic classification of all the immune factors were performed. Independent influencing factors for OA or RA were identified using univariate binary logistic regression, Lasso regression, and multivariate binary logistic regression. Based on the independent influencing factors, three machine learning models were constructed which are logistic regression, random forest, and support vector machine. Receiver operating characteristic curve (ROC), calibration curve and decision curve analysis (DCA) were used to evaluate and compare the models. Results A total of 5 indicators in the knee joint fluid were screened out to distinguish OA and RA, which were IL-1β(odds ratio(OR)=10.512, 95× confidence interval (95×CI) was 1.048-105.42, P=0.045), IL-6 (OR=1.007, 95×CI was 1.001-1.014, P=0.022), MMP9 (OR=3.202, 95×CI was 1.235-8.305, P=0.017), MMP13 (OR=1.002, 95× CI was 1-1.004, P=0.049), and RF (OR=1.091, 95×CI was 1.01-1.179, P=0.026). According to the results of ROC, calibration curve and DCA, the accuracy (0.979), sensitivity (0.98) and area under the curve (AUC, 0.996, 95×CI was 0.991-1) of the random forest model were the highest. It has good validity and feasibility, and its distinguishing ability is better than the other two models. Conclusion The machine learning model based on immune factors in the knee joint fluid holds significant value in distinguishing OA and RA. It provides an important reference for the clinical early differential diagnosis, prevention and treatment of OA and RA.
Humans
;
Arthritis, Rheumatoid/metabolism*
;
Machine Learning
;
Male
;
Female
;
Middle Aged
;
Aged
;
Synovial Fluid/immunology*
;
Osteoarthritis, Knee/metabolism*
;
Knee Joint/metabolism*
;
ROC Curve
;
Diagnosis, Differential

Result Analysis
Print
Save
E-mail