Abstract

BackgroundAlternative splicing is an important gene regulation mechanism. It is estimated that about 74% of multi-exon human genes have alternative splicing. High throughput tandem (MS/MS) mass spectrometry provides valuable information for rapidly identifying potentially novel alternatively-spliced protein products from experimental datasets. However, the ability to identify alternative splicing events through tandem mass spectrometry depends on the database against which the spectra are searched.ResultsWe wrote scripts in perl, Bioperl, mysql and Ensembl API and built a theoretical exon-exon junction protein database to account for all possible combinations of exons for a gene while keeping the frame of translation (i.e., keeping only in-phase exon-exon combinations) from the Ensembl Core Database. Using our liver cancer MS/MS dataset, we identified a total of 488 non-redundant peptides that represent putative exon skipping events.ConclusionOur exon-exon junction database provides the scientific community with an efficient means to identify novel alternatively spliced (exon skipping) protein isoforms using mass spectrometry data. This database will be useful in annotating genome structures using rapidly accumulating proteomics data.

Highlights

  • Alternative splicing is an important gene regulation mechanism

  • With the advances in mass spectrometry (MS) and largescale generation of MS/MS-based proteomics data, it has become clear that MS-based peptide sequence data can be mined to identify and validate alternative splicing events of genes

  • Mass spectrometry database searches use databases consisting of known or putatively translated protein sequences, which are biased towards well-known proteins or their common alternatively spliced isoforms that exist in the database

Read more

Summary

Introduction

Alternative splicing is an important gene regulation mechanism. It is estimated that about 74% of multi-exon human genes have alternative splicing. The ability to identify alternative splicing events through tandem mass spectrometry depends on the database against which the spectra are searched. With the vast number of Expressed Sequence Tags (EST) generated by the EST sequencing projects [4], and more recent development in direct mRNA sequencing (mRNA-Seq) by the generation sequencing technologies, many alternative splicing events of genes were identified and annotated in the human genome. With the advances in mass spectrometry (MS) and largescale generation of MS/MS (tandem MS)-based proteomics data, it has become clear that MS-based peptide sequence data can be mined to identify and validate alternative splicing events of genes. Mass spectrometry database searches use databases consisting of known or putatively translated protein sequences, which are biased towards well-known proteins or their common alternatively spliced isoforms that exist in the database. The six-frame translation approach [6,7] does not take into account all potential splicing possibilities

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call