Searching A Sequence Database With Complex Proteomic Data



So far we have learned how to identify proteins with peptide-mass fingerprinting, sequence tag, and with un-interpreted MS/MS data.  Most of the procedures we have performed have been fairly manual.  Generally proteomic data is gathered during an LC/MS run, that is, mass spectra are collected during an HPLC separation of a complex mixture of peptides.  Modern mass spectrometers can collect 2 to 4 fragment spectra per second. Most proteomic LC/MS analyses last at least three hours, and most multidimensional separations, i.e. strong cation exchange followed by reverse phase LC/MS, can take anywhere from 12 to 24 hours, and that is for a single sample. A three hour run with a mass spectrometer collecting 4 fragment spectra every second could yield 43,000 spectra. If you don't have a hundred graduate students then sophisticated software is your only option. 

A Complex Proteomic Protein ID Exercise: 

Use the short but complex LC/MS run shown in Figure 1 below and search a sequence database.   For this example you will be using the web version of Phenyx.  You can log in as a guest, however we encourage you to create a free account so that you can track your submissions.    


Figure 1.  This is a complex peptide separation collected on an LCQ DECA XP Plus ion trap mass spectrometer over a relatively short period of time, 10 min.  An entire proteome was digested with trypsin.  A small aliquot was separated on a 0.075X100mm reverse phase column.  Two fragment spectra were collected every second.  There are about 1200 spectra in the 10 min separation shown above.
To simplify the exercise three hundred of the most intense fragment spectra were concatenated into a simple notepad text file.  Right click here and choose to "Save Target As..." then choose a location on your computer to save it.  You will need to "un-zip" this proteomics file.  Log in to the web version of Phenyx. and press the submission button, this is the page that will come up.  Some of the entry information has been changed on the entry page shown below.  This was done to match our data file. You will need to change some of the fields to match the page shown below.



Figure 2.   This is the data entry page that comes up after pressing the submission button. 


Data Entry Instructions:  You will need to make a few changes to this entry page to match your data file.  Choose NCBInr as the database, choose Insecta under Taxonomy.  At the bottom of the page browse to the proteomics data you downloaded previously, choose proteome.dta.  The instrument used was an ESI ion trap, LCQ select these option in the top right section of the entry page .  Choose Cys_PAM because the cysteine residues in this proteome were modified with acrylamide, adding 71u to each Cys residue.  Set the parent error mass tolerance to 2 DA   Choose dta as the data format.  Also choose parent Charge 1, 2, 3. Press the Submit button at the bottom of the page.  The search will take a few minutes, there are many MS/MS spectra contained in the file you just submitted.  

Results: Explore the results page.  How many proteins were identified in this short proteomics experiment?  Notice the sequence coverage for each protein.  How many proteins have two or more peptides identified?  What type of species do you think these proteins came from?  
Conclusion:  All right, it's official you are a proteomics expert.  Don't forget to pick up your certificate as you exit.  Oh yes, one last thing before we can issue your certificate please proceed to the additional reading and tutorial reference page and complete the assigned reading.  Congratulations!

Suggested Reading For This Module:

  1. Olsen JV, Mann M. Improved peptide identification in proteomics by two consecutive stages of mass spectrometric fragmentation. Proc Natl Acad Sci U S A. 2004 Sep 14;101(37):13417-22. Epub 2004 Sep 03. PMID: 15347803

  2. Colinge J, Masselot A, Cusin I, Mahé E, Niknejad A, Argoud-Puy G, Reffas S, Bederr N, Gleizes A, Rey P-A, Bougueleret L. "High performance peptide identification by tandem mass spectrometry allows reliable automatic data processing in proteomics," Proteomics. 2004 Jul;4(7):1977-84.

  3. Colinge J, Masselot A. "Mass spectrometry has married statistics: uncle is functionality, children are selectivity and sensitivity," DDT: TARGETS, Vol. 3, No. 2 (Suppl.), 2004, pp. 50-55.

  4. Magnin J, Masselot A, Menzel C, Colinge J. OLAV-PMF: a novel scoring scheme for high-throughput peptide mass fingerprinting.  J Proteome Res. 2004 Jan-Feb;3(1):55-60. PMID 14998163

  5. Colinge J, Magnin J, Masselot A. A systematic statistical analysis of ion trap tandem mass spectra in view of peptide scoring. In: Proceeding of the Workshop on Algorithms in Bioinformatics (WABI), R. Page and G. Benson (Eds), Budapest, September 2003, LNBI 2812, Springer, 25-38, 2003.

  6. Colinge J, Masselot A, Giron M, Dessingy T, Magnin J.. "OLAV: towards high-throughput tandem mass spectrometry data identification," Proteomics, Vol. 3, No. 8, August 2003, pp. 1454-1463.

  7. McDonald WH, Yates JR 3rd. Shotgun proteomics: integrating technologies to answer biological questions. Curr Opin Mol Ther. 2003 Jun;5(3):302-9. Review. PMID: 12870441
  8. Colinge J, Magnin J, Dessingy T, Giron M, Masselot A. Improved peptide charge state assignment.  Proteomics. 2003 Aug;3(8):1434-40. PMID: 12923768

  9. Lin D, Tabb DL, Yates JR 3rd. Large-scale protein identification using mass spectrometry.
    Biochim Biophys Acta. 2003 Mar 21;1646(1-2):1-10. Review.PMID: 12637006

  10. Hunter TC, Andon NL, Koller A, Yates JR, Haynes PA. The functional proteomics toolbox: methods and applications. J Chromatogr B Analyt Technol Biomed Life Sci. 2002 Dec 25;782(1-2):165-81. Review. No abstract available.PMID: 12458005

  11. McDonald WH, Yates JR 3rd. Shotgun proteomics and biomarker discovery.
    Dis Markers. 2002;18(2):99-105. Review. PMID: 12364816





return to toc



e-mail the  with all inquiries
home | terms of use (disclaimer) 
Copyright © 2004-2016  IonSource  All rights reserved. 
Last updated:  Tuesday, January 19, 2016 02:48:32 PM