Integration of A Deep Learning Classifier with A Random Forest Approach for Predicting Malonylation Sites.
10.1016/j.gpb.2018.08.004
- Author:
Zhen CHEN
1
;
Ningning HE
1
;
Yu HUANG
2
;
Wen Tao QIN
3
;
Xuhan LIU
4
;
Lei LI
5
,
6
,
7
Author Information
1. School of Basic Medicine, Qingdao University, Qingdao 266021, China.
2. School of Data Science and Software Engineering, Qingdao University, Qingdao 266021, China.
3. Department of Biochemistry, Schulich School of Medicine and Dentistry, University of Western Ontario, London, Ontario N6A 5C1, Canada.
4. Department of Information Technology, Beijing Oriental Yamei Gene Technology Institute Co. Ltd., Beijing 100078, China. Electronic address: xuhanliu@amagene.cn.
5. School of Basic Medicine, Qingdao University, Qingdao 266021, China
6. School of Data Science and Software Engineering, Qingdao University, Qingdao 266021, China
7. Qingdao Cancer Institute, Qingdao University, Qingdao 266021, China. Electronic address: leili@qdu.edu.cn.
- Publication Type:Journal Article
- Keywords:
Deep learning;
LSTM;
Malonylation;
Random forest;
Recurrent neural network
- MeSH:
Amino Acid Sequence;
genetics;
Amino Acids;
Animals;
Deep Learning;
Forecasting;
methods;
Lysine;
chemistry;
Machine Learning;
Malonates;
chemistry;
Protein Processing, Post-Translational;
genetics
- From:
Genomics, Proteomics & Bioinformatics
2018;16(6):451-459
- CountryChina
- Language:English
-
Abstract:
As a newly-identified protein post-translational modification, malonylation is involved in a variety of biological functions. Recognizing malonylation sites in substrates represents an initial but crucial step in elucidating the molecular mechanisms underlying protein malonylation. In this study, we constructed a deep learning (DL) network classifier based on long short-term memory (LSTM) with word embedding (LSTM) for the prediction of mammalian malonylation sites. LSTM performs better than traditional classifiers developed with common pre-defined feature encodings or a DL classifier based on LSTM with a one-hot vector. The performance of LSTM is sensitive to the size of the training set, but this limitation can be overcome by integration with a traditional machine learning (ML) classifier. Accordingly, an integrated approach called LEMP was developed, which includes LSTM and the random forest classifier with a novel encoding of enhanced amino acid content. LEMP performs not only better than the individual classifiers but also superior to the currently-available malonylation predictors. Additionally, it demonstrates a promising performance with a low false positive rate, which is highly useful in the prediction application. Overall, LEMP is a useful tool for easily identifying malonylation sites with high confidence. LEMP is available at http://www.bioinfogo.org/lemp.