1.SuccSite:Incorporating Amino Acid Composition and Informative k-spaced Amino Acid Pairs to Identify Protein Succinylation Sites
Kao HUI-JU ; Nguyen VAN-NUI ; Huang KAI-YAO ; Chang WEN-CHI ; Lee TZONG-YI
Genomics, Proteomics & Bioinformatics 2020;18(2):208-219
Protein succinylation is a biochemical reaction in which a succinyl group (-CO-CH2-CH2-CO-) is attached to the lysine residue of a protein molecule. Lysine succinylation plays important regulatory roles in living cells. However, studies in this field are limited by the difficulty in experi-mentally identifying the substrate site specificity of lysine succinylation. To facilitate this process, several tools have been proposed for the computational identification of succinylated lysine sites. In this study, we developed an approach to investigate the substrate specificity of lysine succinylated sites based on amino acid composition. Using experimentally verified lysine succinylated sites col-lected from public resources, the significant differences in position-specific amino acid composition between succinylated and non-succinylated sites were represented using the Two Sample Logo pro-gram. These findings enabled the adoption of an effective machine learning method, support vector machine, to train a predictive model with not only the amino acid composition, but also the com-position of k-spaced amino acid pairs. After the selection of the best model using a ten-fold cross-validation approach, the selected model significantly outperformed existing tools based on an inde-pendent dataset manually extracted from published research articles. Finally, the selected model was used to develop a web-based tool, SuccSite, to aid the study of protein succinylation. Two pro-teins were used as case studies on the website to demonstrate the effective prediction of succinyla-tion sites. We will regularly update SuccSite by integrating more experimental datasets. SuccSite is freely accessible at http://csb.cse.yzu.edu.tw/SuccSite/.
2.Artificial intelligence predicts direct-acting antivirals failure among hepatitis C virus patients: A nationwide hepatitis C virus registry program
Ming-Ying LU ; Chung-Feng HUANG ; Chao-Hung HUNG ; Chi‐Ming TAI ; Lein-Ray MO ; Hsing-Tao KUO ; Kuo-Chih TSENG ; Ching-Chu LO ; Ming-Jong BAIR ; Szu-Jen WANG ; Jee-Fu HUANG ; Ming-Lun YEH ; Chun-Ting CHEN ; Ming-Chang TSAI ; Chien-Wei HUANG ; Pei-Lun LEE ; Tzeng-Hue YANG ; Yi-Hsiang HUANG ; Lee-Won CHONG ; Chien-Lin CHEN ; Chi-Chieh YANG ; Sheng‐Shun YANG ; Pin-Nan CHENG ; Tsai-Yuan HSIEH ; Jui-Ting HU ; Wen-Chih WU ; Chien-Yu CHENG ; Guei-Ying CHEN ; Guo-Xiong ZHOU ; Wei-Lun TSAI ; Chien-Neng KAO ; Chih-Lang LIN ; Chia-Chi WANG ; Ta-Ya LIN ; Chih‐Lin LIN ; Wei-Wen SU ; Tzong-Hsi LEE ; Te-Sheng CHANG ; Chun-Jen LIU ; Chia-Yen DAI ; Jia-Horng KAO ; Han-Chieh LIN ; Wan-Long CHUANG ; Cheng-Yuan PENG ; Chun-Wei- TSAI ; Chi-Yi CHEN ; Ming-Lung YU ;
Clinical and Molecular Hepatology 2024;30(1):64-79
Background/Aims:
Despite the high efficacy of direct-acting antivirals (DAAs), approximately 1–3% of hepatitis C virus (HCV) patients fail to achieve a sustained virological response. We conducted a nationwide study to investigate risk factors associated with DAA treatment failure. Machine-learning algorithms have been applied to discriminate subjects who may fail to respond to DAA therapy.
Methods:
We analyzed the Taiwan HCV Registry Program database to explore predictors of DAA failure in HCV patients. Fifty-five host and virological features were assessed using multivariate logistic regression, decision tree, random forest, eXtreme Gradient Boosting (XGBoost), and artificial neural network. The primary outcome was undetectable HCV RNA at 12 weeks after the end of treatment.
Results:
The training (n=23,955) and validation (n=10,346) datasets had similar baseline demographics, with an overall DAA failure rate of 1.6% (n=538). Multivariate logistic regression analysis revealed that liver cirrhosis, hepatocellular carcinoma, poor DAA adherence, and higher hemoglobin A1c were significantly associated with virological failure. XGBoost outperformed the other algorithms and logistic regression models, with an area under the receiver operating characteristic curve of 1.000 in the training dataset and 0.803 in the validation dataset. The top five predictors of treatment failure were HCV RNA, body mass index, α-fetoprotein, platelets, and FIB-4 index. The accuracy, sensitivity, specificity, positive predictive value, and negative predictive value of the XGBoost model (cutoff value=0.5) were 99.5%, 69.7%, 99.9%, 97.4%, and 99.5%, respectively, for the entire dataset.
Conclusions
Machine learning algorithms effectively provide risk stratification for DAA failure and additional information on the factors associated with DAA failure.