1.GPT2-ICC: A data-driven approach for accurate ion channel identification using pre-trained large language models.
Zihan ZHOU ; Yang YU ; Chengji YANG ; Leyan CAO ; Shaoying ZHANG ; Junnan LI ; Yingnan ZHANG ; Huayun HAN ; Guoliang SHI ; Qiansen ZHANG ; Juwen SHEN ; Huaiyu YANG
Journal of Pharmaceutical Analysis 2025;15(8):101302-101302
Current experimental and computational methods have limitations in accurately and efficiently classifying ion channels within vast protein spaces. Here we have developed a deep learning algorithm, GPT2 Ion Channel Classifier (GPT2-ICC), which effectively distinguishing ion channels from a test set containing approximately 239 times more non-ion-channel proteins. GPT2-ICC integrates representation learning with a large language model (LLM)-based classifier, enabling highly accurate identification of potential ion channels. Several potential ion channels were predicated from the unannotated human proteome, further demonstrating GPT2-ICC's generalization ability. This study marks a significant advancement in artificial-intelligence-driven ion channel research, highlighting the adaptability and effectiveness of combining representation learning with LLMs to address the challenges of imbalanced protein sequence data. Moreover, it provides a valuable computational tool for uncovering previously uncharacterized ion channels.
2.GPT2-ICC:A data-driven approach for accurate ion channel identification using pre-trained large language models
Zihan ZHOU ; Yang YU ; Chengji YANG ; Leyan CAO ; Shaoying ZHANG ; Junnan LI ; Yingnan ZHANG ; Huayun HAN ; Guoliang SHI ; Qiansen ZHANG ; Juwen SHEN ; Huaiyu YANG
Journal of Pharmaceutical Analysis 2025;15(8):1800-1809
Current experimental and computational methods have limitations in accurately and efficiently classi-fying ion channels within vast protein spaces.Here we have developed a deep learning algorithm,GPT2 Ion Channel Classifier(GPT2-ICC),which effectively distinguishing ion channels from a test set con-taining approximately 239 times more non-ion-channel proteins.GPT2-ICC integrates representation learning with a large language model(LLM)-based classifier,enabling highly accurate identification of potential ion channels.Several potential ion channels were predicated from the unannotated human proteome,further demonstrating GPT2-ICC's generalization ability.This study marks a significant advancement in artificial-intelligence-driven ion channel research,highlighting the adaptability and effectiveness of combining representation learning with LLMs to address the challenges of imbalanced protein sequence data.Moreover,it provides a valuable computational tool for uncovering previously uncharacterized ion channels.

Result Analysis
Print
Save
E-mail