Abstract

Massively parallel DNA sequencing enables the detection of thousands of germline and somatic single nucleotide variants (SNVs) in cancer samples. The functional analysis of these mutations is often carried out through in silico predictions, with further downstream experimental validation rarely performed. Here, we examine the potential of using mass spectrometry-based proteomics data to further annotate the function of SNVs in cancer samples. RNA-seq and whole genome sequencing (WGS) data from Jurkat cells were used to construct a custom database of single amino acid variant (SAAV) containing peptides and identified over 1,000 such peptides in two Jurkat proteomics datasets. The analysis enabled the detection of a truncated form of splicing regulator YTHDC1 at the protein level. To extend the functional annotation further, a Jurkat phosphoproteomics dataset was analysed, identifying 463 SAAV containing phosphopeptides. Of these phosphopeptides, 24 SAAVs were found to directly impact the phosphorylation event through the creation of either a phosphorylation site or a kinase recognition motif. We identified a novel phosphorylation site created by a SAAV in splicing factor SF3B1, a protein that is frequently mutated in leukaemia. To our knowledge, this is the first study to use phosphoproteomics data to directly identify novel phosphorylation events arising from the creation of phosphorylation sites by SAAVs. Our study reveals multiple functional mutations impacting the splicing pathway in Jurkat cells and demonstrates potential benefits of an integrative proteogenomics analysis for high-throughput functional annotation of SNVs in cancer.

Highlights

  • DNA sequencing technologies have enabled the rapid identification of single nucleotide variants (SNVs) within cancer genomes

  • A possible reason for this is that the RNA sequencing (RNA-seq) data for The Cancer Genome Atlas (TCGA) colorectal cancer samples were not sufficiently deep for calling single nucleotide variants (SNVs) (~10M 50bp single end reads versus ~100M 100 bp paired-end reads for the Jurkat dataset from Sheynkman et al [4])

  • To our knowledge, ours is the first study to have used phosphoproteomics data to directly identify novel phosphorylation events arising from the creation of phosphorylation sites by single amino acid variant (SAAV)

Read more

Summary

Introduction

DNA sequencing technologies have enabled the rapid identification of single nucleotide variants (SNVs) within cancer genomes. Hundreds to thousands of nonsynonymous SNVs can be routinely identified within gene coding regions of individual cancers through whole genome or exome sequencing (WGS/WXS) [1], and expressed SNVs can be detected by RNA sequencing (RNA-seq) [2]. To determine whether these SNVs are functionally important in cancer, in silico functional annotation tools are often used. In a larger scale analysis by The Cancer Genome Atlas (TCGA) [5], 86 colorectal cancer proteomics datasets were searched against customised databases consisting of variants detected from sample-matched RNA-seq data. The proteomics data was generally of lower depth than the Jurkat dataset (~100K spectra per sample versus ~500K spectra for the Jurkat dataset)

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call