CusVarDB: A tool for building customized sample-specific variant protein database from next-generation sequencing datasets.

Sandeep Kasaragod,Varshasnata Mohanty,Prashant Kumar Modi,T S Keshava Prasad,Sneha M Pinto,Harsha Gowda,Arun H Patil,Ankur Tyagi,Santosh Kumar Behera

doi:10.12688/f1000research.23214.1

Sandeep Kasaragod, Varshasnata Mohanty + Show 7 more

Open Access

https://doi.org/10.12688/f1000research.23214.1

Copy DOI

Abstract

Cancer genome sequencing studies have revealed a number of variants in coding regions of several genes. Some of these coding variants play an important role in activating specific pathways that drive proliferation. Coding variants present on cancer cell surfaces by the major histocompatibility complex serve as neo-antigens and result in immune activation. The success of immune therapy in patients is attributed to neo-antigen load on cancer cell surfaces. However, which coding variants are expressed at the protein level can't be predicted based on genomic data. Complementing genomic data with proteomic data can potentially reveal coding variants that are expressed at the protein level. However, identification of variant peptides using mass spectrometry data is still a challenging task due to the lack of an appropriate tool that integrates genomic and proteomic data analysis pipelines. To overcome this problem, and for the ease of the biologists, we have developed a graphical user interface (GUI)-based tool called CusVarDB. We integrated variant calling pipeline to generate sample-specific variant protein database from next-generation sequencing datasets. We validated the tool with triple negative breast cancer cell line datasets and identified 423, 408, 386 and 361 variant peptides from BT474, MDMAB157, MFM223 and HCC38 datasets, respectively.

Highlights

Cancer genome sequencing projects have revealed thousands of genomic variations in cancers (Forbes et al, 2010; Tomczak et al, 2015; Tate et al, 2019; Zhang et al, 2011)
A mutation in gene BRAF V600E is known to result in increased possibility of metastatic melanoma (Chapman et al, 2011)
Some of these mutant proteins are proteolytically processed in cancer cells, resulting in major histocompatibility complex (MHC) presentation of mutant peptides

Summary

Introduction

Cancer genome sequencing projects have revealed thousands of genomic variations in cancers (Forbes et al, 2010; Tomczak et al, 2015; Tate et al, 2019; Zhang et al, 2011). A mutation in gene BRAF V600E is known to result in increased possibility of metastatic melanoma (Chapman et al, 2011) Some of these mutant proteins are proteolytically processed in cancer cells, resulting in major histocompatibility complex (MHC) presentation of mutant peptides. General workflows were used in such investigations, wherein a reference protein database was used to search the experimentally derived tandem mass spectrometry data for the identification and quantification of the proteins (Kelkar et al, 2014) Such a reference database is usually deprived of sample-specific amino acid variations brought about by genomic aberrations and coding SNPs reported for the various cancers. We developed CusVarDB with an in-built pipeline for genomics suite in deriving variants and creating custom variant protein databases

Methods

Findings

Conclusions