Identification With Mass Spectrometry Data
degradation, N-terminal protein sequencing, was the method of choice for
protein ID prior to the advent of mass spectrometry based ID. Edman degradation
is still a very powerful technique. With Edman sequencing amino
acids are cleaved from the N-terminus of a peptide or protein and each
amino acid is then chromatographed using a 20 to 50 min HPLC gradient.
Identification is based on correlating the retention time of the eluting amino
acid to a standard chromatogram. The power of this technique is that
the exact sequence can often be determined, and there is no confusion as
in MS with amino acids having isobaric mass. The technique is simple
and powerful and....slow and usually it only identifies one protein at a
time. On average it takes about seven cycles of the sequencer to
uniquely identify a protein in a sequence database, if you are running a 50 min
gradient, that's about six hours!
Identification of proteins by mass spectrometry uses peptide
masses or the MS/MS fragmentation of a peptide to identify proteins. In stark contrast mass spectrometry can easily ID 10-20
proteins in about 30 min! Here is a short description of a few of the most popular MS techniques.
- Peptide Mass Fingerprinting : A protein is first digested
with an enzyme and the peptide masses are then used to search a
- Sequence Tag: A peptide is fragmented in a mass
spectrometer and then a short stretch of amino acids is
determined. This "tag", (peptide mass, sequence of the
tag, starting and ending mass of the tag), is used to search a
sequence database. Proteins can be correlated with the
fragmentation of a single peptide using this technique.(3)
- MS/MS Peptide Identification: A peptide is
fragmented in a mass spectrometer and the fragment ion masses are then
search a sequence database. Proteins can be correlated with the
fragmentation of a single peptide using this technique.(4)
With all of these techniques the larger the set of peptides identified for a given protein the better the
identification. Identification is really too strong a term, perhaps
correlation is more appropriate. Throughout the tutorial we will be
using identification, ID and correlation interchangeably. For example, if you can only identify a single
peptide, then you may only be able to narrow the search down to a family
of proteins. If you can only correlate a portion of the sequence you will
never be entirely sure if you have identified the protein in the database
or an entirely new splice variant with an entirely different
If you are interested in MS ID
philosophy and strategy please continue on, if you want to get directly to
the techniques and the examples browse back to the table
of contents and jump ahead to the techniques.
WJ, Billeci TM, Stults JT, Wong SC, Grimley C, Watanabe C. Identifying
proteins from two-dimensional gels by molecular mass searching of peptide
fragments in protein sequence databases. Proc Natl Acad Sci U S
A. 1993 Jun 1;90(11):5011-5.
WJ, Watanabe C, Stults JT. Protein identification: the origins of
peptide mass fingerprinting. J Am Soc Mass Spectrom. 2003
M, Wilm M., Error-tolerant identification of peptides in
sequence databases by peptide sequence tags. Anal Chem. 1994
JR 3rd, Eng JK, McCormack AL, Schieltz D., Method
to correlate tandem mass spectra of modified peptides to amino acid
sequences in the protein database.
Anal Chem. 1995 Apr 15;67(8):1426-36.