Proteomics Protein Identification Philosophy

Introduction: When one thinks of identification in a chemical sense one thinks of knowing a compounds absolute chemical composition to the last atom, and not only the composition, but the arrangement of those atoms in space.  In the following tutorial we will demonstrate that many protein identification techniques are really peptide identification techniques. In reality these identifications are really just correlations.  To quote the author, "Most of these techniques are peptide homology searches with a parent mass constraint!"  For example, it is possible to identify a peptide by parent mass, and correlate the MS/MS fragment ions to a large extent with a high score, and still have several amino acid residues within that peptide in reverse order, as an example, see Figure 1 below. As shown in Figure 1 the MS/MS spectrum correlates better to the actin eight sequence DFEAEMQTAASSSAL in the bottom plot than it does to the actin one sequence DFEQEMATAASSSAL as shown in then upper plot.  If the actin eight sequence was not in the database we would easily identify this protein as actin one with high confidence! Edman sequencing would never make this mistake.  With MS identification it is also possible to get the sequence wrong as isobaric amino acids leucine/isoleucine, and near isobaric amino acids lysine/glutamine, may be substituted in the sequence database.  These are just some of the caveats encountered in protein identification with mass spectrometry data.



Figure 1.  This one spectrum was correlated to two similar sequences in a protein database from the same species.  Both of the correlations scored very high with the program SequestTM .  The bottom spectrum has a slightly higher score, and also note to the left the fragment peak coverage is better.  The bottom spectrum, actin eight, is a better match than the actin one, upper correlation.  Note that the amino acid residues Q and A have been switched in order yielding two sequences that correlate very well.  Of course both of these Asp-N cleaved  peptides have the same parent mass and many of the same "b" and "y" ions.

With mass spectrometry data we are at the mercy of the sequence database.  If the protein sequence is not entered in the database then we obviously will not be able to identify it with MS matching techniques.  Protein databases are constructed by humans, and sadly not by divine inspiration.  Researchers make contributions to public databases usually through protein Edman sequencing efforts made in their research projects.  Others sequences are garnered directly from publications, and patents.  These sequences are prone to a number of errors: human sequencing errors, human transcriptional errors, typographical errors, and database curation errors.  In addition it is difficult to know the theoretical absolute chemical composition of large proteins due to a myriad of modifications that occur within these proteins.  Degradations are common, oxidation of methionine, deamidation of asparagine and glutamine, backbone cleavages and a myriad of other amino acid modifications can and do occur.  In addition post translational modifications are common, for example,  glycosylations and phosphorylations which can shift both mass and charge.  For a larger more complete list of modifications the reader is referred to UNIMOD, "A Database of Protein Modifications."       

The typical "good" MS based protein hit will correlate a protein with about 30% of sequence covered perhaps with 7 peptides identified.  If you take a look at some of the proteomics studies currently being published, proteins are being identified with one or two peptides.  This is most definitely a homology identification.  In the strictest sense it is a homology identification at the peptide level and by definition a homology identification at the protein level.  To get closer to an absolute identification one would need an accurate mass measurement of the intact protein to match to the peptide correlated database hit.  We are really identifying correlating peptides.  This is not an identification it is a correlation.

Discouraged?  Don't be.  The protein, proteomics universe is a wild place, and while it is still wild there is much opportunity to be had.   Will you be the next proteomics genius?  Maybe. Read the next page to learn what we do to be more certain.


previous return to toc next




e-mail the  with all inquiries
home | terms of use (disclaimer) 
Copyright � 2004-2016  IonSource  All rights reserved. 
Last updated:  Tuesday, January 19, 2016 02:48:32 PM
note: SequestTM is a registered trademark of Thermo Electron Corporation