Interrogating the human genome using uninterpreted mass spectrometry data.

Jyoti S Choudhary,David M Creasy,Walter P Blackstock,John S Cottrell

doi:10.1002/1615-9861(200104)1:5<651::aid-prot651>3.0.co;2-n

Jyoti S Choudhary, David M Creasy + Show 2 more

https://doi.org/10.1002/1615-9861(200104)1:5<651::aid-prot651>3.0.co;2-n

Copy DOI

Abstract

The public availability of a draft assembly of the human genome has enabled us to demonstrate, for the first time, the feasibility of searching a complete, unmasked eukaryotic genome using uninterpreted mass spectrometry data. A complex LC-MS/MS data set, containing peptides from at least 22 human proteins, was searched against a comprehensive, nonidentical protein database, an expressed sequence tag (EST) database, and the International Human Genome Project draft assembly of the human genome. The results from the three searches are compared in detail, and the merits of the different databases for this application are discussed. In the case of the EST database, the UniGene index provided a method of simplifying and summarising the search results. In the case of the genomic DNA, the presence of introns prevented matching of roughly one quarter of the spectra, but the technique can provide primary experimental verification of predicted coding sequences, and has the potential to identify novel coding sequences.

Full Text