Protein Identification With Mass Spectrometry Data



Edman degradation was the method of choice for protein identification prior to affordable mass spectrometry based ID methods.  Edman degradation is still a very powerful technique. With Edman sequencing, amino acids are cleaved from the N-terminus of a peptide or protein, and each amino acid is then chromatographed using a 20 to 50 min HPLC gradient. Identification is based on correlating  the retention time of the eluting amino acid to a standard chromatogram.  The power of the Edman technique is that the exact sequence can often be determined, and there is no confusion as in MS, with some amino acids having isobaric mass.  The Edman technique is simple, and powerful, and....slow, and it usually identifies one protein at a time.  On average it takes about seven cycles of the sequencer to uniquely identify a protein in a sequence database, if you are running a 50 min gradient, that's about six hours for one protein ID! 

Identification of proteins by mass spectrometry uses peptide masses or the MS/MS fragmentation of a peptide to identify proteins.  In stark contrast mass spectrometry can easily ID 10-50 proteins in about 30 min!  Here are a few of the most popular MS ID techniques.

  • Peptide Mass Fingerprinting : A protein is first digested with an enzyme and the peptide masses are then used to search a sequence database.(1,2)
  • Sequence Tag:  A peptide is fragmented in a mass spectrometer, and then a short stretch of amino acids is determined.  With this technique, the parent mass of the peptide, the sequence of the tag, and the starting and ending mass of the tag, is used to search a sequence database. Proteins can be correlated with the fragmentation of a single peptide using this technique.(3)
  • MS/MS Peptide Identification:  A peptide is fragmented in a mass spectrometer and the fragment ion masses are then used to search a sequence database.  Proteins can be correlated with the fragmentation of a single peptide using this technique.(4)

With all of these techniques the larger the set of peptides identified for a given protein the better the identification.  Identification is really too strong a term, perhaps correlation is more appropriate.  Throughout the tutorial we will be using the terms, identification, ID, and correlation interchangeably. For example, if you can only identify a single peptide, then you may only be able to narrow the search down to a family of proteins.  If you can only correlate a portion of the sequence you will never be entirely sure if you have identified the protein in the database or an entirely new splice variant with an entirely different function.  

If you are interested in MS ID philosophy and strategy please continue on, if you want to get directly to the techniques and the examples browse back to the table of contents and jump ahead to the techniques.


  1. Henzel WJ, Billeci TM, Stults JT, Wong SC, Grimley C, Watanabe C.  Identifying proteins from two-dimensional gels by molecular mass searching of peptide fragments in protein sequence databases.  Proc Natl Acad Sci U S A. 1993 Jun 1;90(11):5011-5.

  2. Henzel WJ, Watanabe C, Stults JT. Protein identification: the origins of peptide mass fingerprinting. J Am Soc Mass Spectrom. 2003 Sep;14(9):931-42.

  3. Mann M, Wilm M., Error-tolerant identification of peptides in sequence databases by peptide sequence tags.  Anal Chem. 1994 Dec 15;66(24):4390-9.

  4. Yates JR 3rd, Eng JK, McCormack AL, Schieltz D., Method to correlate tandem mass spectra of modified peptides to amino acid sequences in the protein database.
    Anal Chem. 1995 Apr 15;67(8):1426-36.



return to toc


e-mail the  with all inquiries
home | terms of use (disclaimer) 
Copyright � 2004-2016  IonSource  All rights reserved. 
Last updated:  Tuesday, January 19, 2016 02:48:33 PM