Protein ID With MS/MS Data
|Introduction: We have already talked about how the technique called peptide-mass fingerprinting uses the parent mass of a peptide along with limited sequence data to get a best fit match to a protein in a sequence database. MS/MS spectral matching uses the un-interpreted peaks in a peptide fragment spectrum to match to a theoretical fragment spectrum in a sequence database. This MS/MS fragment spectrum is the result of a collision induced dissociation, (CID), occurring within a mass spectrometer. The fragmentations are produced either in a collision cell in a tandem mass spectrometer or within an ion trap. In an MS/MS based protein ID experiment multiple peptides are usually found and all of their fragment spectrums are used to correlate to a protein. In both PMF and MS/MS ID the larger the number of peptides identified the greater the confidence in the protein correlation. One can see that the correlation made with multiple MS/MS spectra can easily be superior to the identification made in a PMF experiment. In an extreme example a single peptide mass in a PMF experiment can never really be correlated to a protein, however the heterogeneity imparted to a single peptide in an MS/MS experiment can deliver enough amino acid sequence to correlate to a protein. Why is this? Some may argue that a single peptide mass in a PMF experiment can be correlated if one has an accurate enough mass. However, that mass could be anything, any carbon based thing, there is a lack of heterogeneity in a single mass and a lack of information. With MS/MS data the heterogeneity produced is rich, and a number of constraints can be placed. The primary constraint that might be overlooked is that it is a peptide, remember we have amino acid sequence evidence. Once we know this we can place the most important constraint, enzyme specificity. With the following constraints, enzyme specificity, intact peptide mass, and a partial amino acid sequence, the peptide correlation can be very strong even with a single peptide. Below is a step by step simplification of how sequence database search programs accomplish peptide and protein ID.|
A Step by Step General Scheme for MS/MS Protein ID
Step 1. Step one usually begins with
placing an enzyme specificity constraint. Some programs pre-index the sequence
database based on the enzyme specificity. Indexing makes the search go
much faster. For example theoretical tryptic peptides can be generated, and mass lists
can be pre-made from a protein database. The downside is a separate database needs to be indexed for each
enzyme, or each time a potential modification is changed.
Step 5. If more than one peptide is
searched all of the peptides found are correlated to their prospective
proteins. The protein with the greatest number of well correlated
peptides is usually the most significant hit.
Simple MS/MS Protein ID Exercise:
Use the MS/MS spectrum in Figure1 below to search
a sequence database. For this example we will be using the web version
of the database searching program
|Figure 1. This is a peptide fragment spectrum collected on an LCQ ion trap mass spectrometer. Note the parent m/z value 690.39 in the upper left hand header of the spectrum.|
You can use the dta file
that we have already created for you, right
click on this dta link,
and save this file to your computer, choose save target as. When
saving the file to your computer, during the saving dialog you may
need to select "all files" as the file type and you may also need
to add the suffix .dta to
the "msms" name, in order for your computer to get the file type
correct. A dta file
is a mass / intensity pair list that is a representation of the original
MS/MS spectrum, the dta file is a Thermo Finnigan file format. The dta file
is very small, a few KB, once you have it downloaded, if you want, you can
open it with notepad to inspect the simple file format. The file should
have the name msms.dta
1.) Follow this
to the Mascot web portal.
4.) We will make this easy
The search should only take a minute to complete. Do not browse away, and the result will come up automatically. If you put in your correct e-mail address Mascot would send you a link to your results via e-mail. When the results come up, click around and explore the different links associated with your result.
Inspect the results page:
|What peptide/protein was identified? How many peaks were identified? What peaks went unidentified? How long did the search take? If you had 2000 spectra how long do you think it would take?|
Mass spectrometry based protein identification is too easy right? So far we have used three MS sequence database searching techniques to identify a single protein. Continue on to the final tutorial to learn how to identify many proteins in a somewhat typical proteomics experiment.
|e-mail the firstname.lastname@example.org with all inquiries|
Copyright � 2004-2016 IonSource All rights reserved.
Last updated: Tuesday, January 19, 2016 02:48:20 PM