Screening anti-fibrosis Chinese medicinal compounds based on machine learning
10.3969/j.issn.1006-2157.2019.01.006
- VernacularTitle:基于机器学习的抗纤维化中药化合物筛选研究
- Author:
Xiting WANG
1
;
Yu LI
;
Lan ZHANG
;
Meng LIU
;
Cheng LI
;
Qiushi YANG
;
Xiaoyi HANG
;
Yi LIU
Author Information
1. 北京中医药大学中医学院
- Keywords:
organ fibrosis;
machine learning;
molecular fingerprinting;
Chinese medicinal compound screening
- From:
Journal of Beijing University of Traditional Chinese Medicine
2019;42(1):30-36
- CountryChina
- Language:Chinese
-
Abstract:
Objective To establish a new-type virtual screening predictive model of Chinese medicinal compounds with anti-fibrosis effects, and to verify the predictive performance of the model.Methods The dimension reduction and characteristic optimization of molecular fingerprints were implemented by using random forest (RF) algorithm and gradient boosting decision tree (GBDT) algorithm.A hybrid model of characteristic optimization-machine learning was established, and optimized characteristics were input into logistic regression (LR) and machine learning algorithm of artificial neural network (ANN) for training.Precision, recall rate and F1 value were used for reviewing the performances of various model combinations.The virtual screening predictive model of Chinese medicinal compounds with anti-fibrosis effect was determined according to results of model performance reviewing.The predictive results of anti-fibrosis activity of Chinese medicinal compounds were compared between the virtual screening predictive model and molecular docking model for further verifying the predictive efficiency of the virtual screening predictive model.Results The precision of RF model was 0.76, recall rate was 0.75 and F1 value was 0.74 (AUC=0.818).The precision that of GBDT model was 0.76, recall rate was 0.74 and F1 value was 0.72 (AUC=0.829).The precision of ANN model was 0.75, racall rate was 0.75 and F1 value was 0.75 (AUC=0.802) , and that of model of RF+LR was 0.77, recall rate was 0.76 and F1 value was 0.75 (AUC=0.840).The precision of model of RF+LR was 0.74, recall rate was 0.84 and F1 value was 0.79 (AUC=0.850) , and that of model of GBDT+LR was 0.80, recall rate was 0.80 and F1 value was 0.79 (AUC=0.872).The precision of model of GBDT+ANN was 0.73, recall rate was 0.91 and F1 value was 0.81 (AUC=0.837).The results of molecular docking activities of Chinese medicinal compounds including curcumin, glycyrrhizic acid, hydro-xysafflor yellow A, emodine and gypenoside were accordance with the predictive results of the virtual screening predictive model.Conclusion The model based on RF+LR is better than the models established based on other methods.The virtual screening predictive model has good performance in prediction of Chinese medicinal compounds through comparing with molecular docking model.The method has feature of highthroughput screening and can make up the shortage of compound screening efficiency in molecular docking.It provides a new way for virtual screening prediction of Chinese medicinal compounds with anti-fibrosis effects.