Proteomics Protein Identification Philosophy
When one thinks of identification in a chemical sense one thinks of
knowing a compounds absolute chemical composition to the last atom, and
not only the composition, but the arrangement of those atoms in space. In the following
tutorial we will demonstrate that many protein identification techniques are really
peptide identification techniques. In reality these identifications
are really just correlations. To quote the author, "Most of these techniques
are peptide homology searches with a parent mass constraint!"
For example, it is possible to identify a peptide by parent mass, and
correlate the MS/MS fragment ions to a large extent with a high score, and
still have several amino acid residues within that peptide in reverse order, as an
example, see Figure 1 below. As shown in Figure 1 the MS/MS spectrum correlates
better to the actin eight sequence DFEAEMQTAASSSAL
in the bottom plot than it does to the actin one sequence DFEQEMATAASSSAL
as shown in then upper plot. If the actin
eight sequence was not in the database we would easily identify this
protein as actin one with high confidence! Edman sequencing would never make this
mistake. With MS identification it is
also possible to get the sequence wrong as isobaric amino acids leucine/isoleucine,
and near isobaric amino acids lysine/glutamine,
may be substituted in the sequence database.
These are just some of the caveats encountered in protein identification
with mass spectrometry data.
|Figure 1. This one spectrum was correlated to two similar sequences in a protein database from the same species. Both of the correlations scored very high with the program SequestTM . The bottom spectrum has a slightly higher score, and also note to the left the fragment peak coverage is better. The bottom spectrum, actin eight, is a better match than the actin one, upper correlation. Note that the amino acid residues Q and A have been switched in order yielding two sequences that correlate very well. Of course both of these Asp-N cleaved peptides have the same parent mass and many of the same "b" and "y" ions.|
With mass spectrometry data we are at the mercy of the sequence database. If the protein sequence is not entered in the database then we obviously will not be able to identify it with MS matching techniques. Protein databases are constructed by humans, and sadly not by divine inspiration. Researchers make contributions to public databases usually through protein Edman sequencing efforts made in their research projects. Others sequences are garnered directly from publications, and patents. These sequences are prone to a number of errors: human sequencing errors, human transcriptional errors, typographical errors, and database curation errors. In addition it is difficult to know the theoretical absolute chemical composition of large proteins due to a myriad of modifications that occur within these proteins. Degradations are common, oxidation of methionine, deamidation of asparagine and glutamine, backbone cleavages and a myriad of other amino acid modifications can and do occur. In addition post translational modifications are common, for example, glycosylations and phosphorylations which can shift both mass and charge. For a larger more complete list of modifications the reader is referred to UNIMOD, "A Database of Protein Modifications."
The typical "good" MS based protein hit will correlate a
protein with about 30% of sequence covered perhaps with 7 peptides
identified. If you take a look at some of the proteomics studies
currently being published, proteins are being identified with one or two
peptides. This is most definitely a homology identification.
In the strictest sense it is a homology identification at the peptide level and by
definition a homology identification at the protein level. To get
closer to an absolute identification one would need an accurate mass
measurement of the intact protein to match to the peptide correlated
database hit. We are really
Discouraged? Don't be. The protein, proteomics universe is a wild place, and while it is still wild there is much opportunity to be had. Will you be the next proteomics genius? Maybe. Read the next page to learn what we do to be more certain.
|previous||return to toc||next|
|e-mail the firstname.lastname@example.org with all inquiries|
Copyright � 2004-2016 IonSource All rights reserved.
Last updated: Tuesday, January 19, 2016 02:48:32 PM
|note: SequestTM is a registered trademark of Thermo Electron Corporation|