MF-SuP-pKa: Multi-fidelity modeling with subgraph pooling mechanism for pKa prediction.

Jialu WU; Yue WAN; Zhenxing WU; Shengyu ZHANG; Dongsheng CAO; Chang-Yu HSIEH; Tingjun HOU

Return

MF-SuP-pK_a: Multi-fidelity modeling with subgraph pooling mechanism for pK_a prediction.

Author: Jialu WU ¹ ; Yue WAN ² ; Zhenxing WU ¹ ; Shengyu ZHANG ² ; Dongsheng CAO ³ ; Chang-Yu HSIEH ¹ ; Tingjun HOU ¹
Author Information

1. Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.
2. Tencent Quantum Laboratory, Tencent, Shenzhen 518057, China.
3. Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410004, China.
Publication Type:Journal Article
Keywords: Data augmentation; Graph neural network; Multi-fidelity learning; Subgraph pooling; pKa prediction
From: Acta Pharmaceutica Sinica B 2023;13(6):2572-2584
CountryChina
Language:English
Abstract: Acid-base dissociation constant (pK_a) is a key physicochemical parameter in chemical science, especially in organic synthesis and drug discovery. Current methodologies for pK_a prediction still suffer from limited applicability domain and lack of chemical insight. Here we present MF-SuP-pK_a (multi-fidelity modeling with subgraph pooling for pK_a prediction), a novel pK_a prediction model that utilizes subgraph pooling, multi-fidelity learning and data augmentation. In our model, a knowledge-aware subgraph pooling strategy was designed to capture the local and global environments around the ionization sites for micro-pK_a prediction. To overcome the scarcity of accurate pK_a data, low-fidelity data (computational pK_a) was used to fit the high-fidelity data (experimental pK_a) through transfer learning. The final MF-SuP-pK_a model was constructed by pre-training on the augmented ChEMBL data set and fine-tuning on the DataWarrior data set. Extensive evaluation on the DataWarrior data set and three benchmark data sets shows that MF-SuP-pK_a achieves superior performances to the state-of-the-art pK_a prediction models while requires much less high-fidelity training data. Compared with Attentive FP, MF-SuP-pK_a achieves 23.83% and 20.12% improvement in terms of mean absolute error (MAE) on the acidic and basic sets, respectively.