FragAnchor: a large-scale predictor of glycosylphosphatidylinositol anchors in eukaryote protein sequences by qualitative scoring.

Guylaine POISSON; Cedric CHAUVE; Xin CHEN; Anne BERGERON

Return

FragAnchor: a large-scale predictor of glycosylphosphatidylinositol anchors in eukaryote protein sequences by qualitative scoring.

Author: Guylaine POISSON ¹ ; Cedric CHAUVE ; Xin CHEN ; Anne BERGERON
Author Information

1. Department of Information and Computer Sciences, University of Hawaii at Manoa, Honolulu, HI 96822, USA. guylaine@hawaii.edu
Publication Type:Journal Article
MeSH: Amino Acid Sequence; Computational Biology; methods; Databases, Protein; Eukaryotic Cells; chemistry; Glycosylphosphatidylinositols; chemistry; isolation & purification; metabolism; Humans; Hydrophobic and Hydrophilic Interactions; Markov Chains; Models, Genetic; Molecular Sequence Data; Neural Networks (Computer); Predictive Value of Tests; Protein Processing, Post-Translational; Proteome; analysis; Sensitivity and Specificity; Sequence Analysis, Protein
From: Genomics, Proteomics & Bioinformatics 2007;5(2):121-130
CountryChina
Language:English
Abstract: A glycosylphosphatidylinositol (GPI) anchor is a common but complex C-terminal post-translational modification of extracellular proteins in eukaryotes. Here we investigate the problem of correctly annotating GPI-anchored proteins for the growing number of sequences in public databases. We developed a computational system, called FragAnchor, based on the tandem use of a neural network (NN) and a hidden Markov model (HMM). Firstly, NN selects potential GPI-anchored proteins in a dataset, then HMM parses these potential GPI signals and refines the prediction by qualitative scoring. FragAnchor correctly predicted 91% of all the GPI-anchored proteins annotated in the Swiss-Prot database. In a large-scale analysis of 29 eukaryote proteomes, FragAnchor predicted that the percentage of highly probable GPI-anchored proteins is between 0.21% and 2.01%. The distinctive feature of FragAnchor, compared with other systems, is that it targets only the C-terminus of a protein, making it less sensitive to the background noise found in databases and possible incomplete protein sequences. Moreover, FragAnchor can be used to predict GPI-anchored proteins in all eukaryotes. Finally, by using qualitative scoring, the predictions combine both sensitivity and information content. The predictor is publicly available at [see text].