GPT2-ICC: A data-driven approach for accurate ion channel identification using pre-trained large language models.
- Author:
Zihan ZHOU
1
;
Yang YU
2
;
Chengji YANG
1
;
Leyan CAO
1
;
Shaoying ZHANG
1
;
Junnan LI
1
;
Yingnan ZHANG
1
;
Huayun HAN
1
;
Guoliang SHI
2
;
Qiansen ZHANG
1
;
Juwen SHEN
1
;
Huaiyu YANG
1
Author Information
- Publication Type:Journal Article
- Keywords: Artificial intelligence; GPT2; Ion channel; Protein language model; Representation learning
- From: Journal of Pharmaceutical Analysis 2025;15(8):101302-101302
- CountryChina
- Language:English
- Abstract: Current experimental and computational methods have limitations in accurately and efficiently classifying ion channels within vast protein spaces. Here we have developed a deep learning algorithm, GPT2 Ion Channel Classifier (GPT2-ICC), which effectively distinguishing ion channels from a test set containing approximately 239 times more non-ion-channel proteins. GPT2-ICC integrates representation learning with a large language model (LLM)-based classifier, enabling highly accurate identification of potential ion channels. Several potential ion channels were predicated from the unannotated human proteome, further demonstrating GPT2-ICC's generalization ability. This study marks a significant advancement in artificial-intelligence-driven ion channel research, highlighting the adaptability and effectiveness of combining representation learning with LLMs to address the challenges of imbalanced protein sequence data. Moreover, it provides a valuable computational tool for uncovering previously uncharacterized ion channels.
