A multimodal contrastive learning framework for predicting P-glycoprotein substrates and inhibitors
10.1016/j.jpha.2025.101313
- Author:
Yixue ZHANG
1
;
Jialu WU
;
Yu KANG
;
Tingjun HOU
Author Information
1. College of Pharmaceutical Sciences,Zhejiang University,Hangzhou,310058,China;Polytechnic Institute of Zhejiang University,Zhejiang University,Hangzhou,310015,China
- Publication Type:Journal Article
- Keywords:
P-glycoprotein;
Deep learning;
Multimodal fusion;
Graph contrastive learning
- From:
Journal of Pharmaceutical Analysis
2025;15(8):1810-1824
- CountryChina
- Language:English
-
Abstract:
P-glycoprotein(P-gp)is a transmembrane protein widely involved in the absorption,distribution,metabolism,excretion,and toxicity(ADMET)of drugs within the human body.Accurate prediction of P-gp inhibitors and substrates is crucial for drug discovery and toxicological assessment.However,existing models rely on limited molecular information,leading to suboptimal model performance for predicting P-gp inhibitors and substrates.To overcome this challenge,we compiled an extensive dataset from public databases and literature,consisting of 5,943 P-gp inhibitors and 4,018 substrates,notable for their high quantity,quality,and structural uniqueness.In addition,we curated two external test sets to validate the model's generalization capability.Subsequently,we developed a multimodal graph contrastive learning(GCL)model for the prediction of P-gp inhibitors and substrates(MC-PGP).This framework integrates three types of features from Simplified Molecular Input Line Entry System(SMILES)sequences,molecular fingerprints,and molecular graphs using an attention-based fusion strategy to generate a unified mo-lecular representation.Furthermore,we employed a GCL approach to enhance structural representations by aligning local and global structures.Extensive experimental results highlight the superior perfor-mance of MC-PGP,which achieves improvements in the area under the curve of receiver operating characteristic(AUC-ROC)of 9.82%and 10.62%on the external P-gp inhibitor and external P-gp substrate datasets,respectively,compared with 12 state-of-the-art methods.Furthermore,the interpretability analysis of all three molecular feature types offers comprehensive and complementary insights,demonstrating that MC-PGP effectively identifies key functional groups involved in P-gp interactions.These chemically intuitive insights provide valuable guidance for the design and optimization of drug candidates.