A framework for organizing cancer-related variations from existing databases, publications and NGS data using a High-performance Integrated Virtual Environment (HIVE)

Tsung-Jung Wu,Raja Mazumder,Krista Smith,Yang Pan,Vahan Simonyan,Daniel J Crichton,Amirhossein Shamsaddini

doi:10.1093/database/bau022

Tsung-Jung Wu, Raja Mazumder + Show 5 more

Open Access

https://doi.org/10.1093/database/bau022

Copy DOI

Journal: Database	Publication Date: Jan 1, 2014
Citations: 65	License type: CC BY 3.0

Affiliation: George Washington University

Abstract

Years of sequence feature curation by UniProtKB/Swiss-Prot, PIR-PSD, NCBI-CDD, RefSeq and other database biocurators has led to a rich repository of information on functional sites of genes and proteins. This information along with variation-related annotation can be used to scan human short sequence reads from next-generation sequencing (NGS) pipelines for presence of non-synonymous single-nucleotide variations (nsSNVs) that affect functional sites. This and similar workflows are becoming more important because thousands of NGS data sets are being made available through projects such as The Cancer Genome Atlas (TCGA), and researchers want to evaluate their biomarkers in genomic data. BioMuta, an integrated sequence feature database, provides a framework for automated and manual curation and integration of cancer-related sequence features so that they can be used in NGS analysis pipelines. Sequence feature information in BioMuta is collected from the Catalogue of Somatic Mutations in Cancer (COSMIC), ClinVar, UniProtKB and through biocuration of information available from publications. Additionally, nsSNVs identified through automated analysis of NGS data from TCGA are also included in the database. Because of the petabytes of data and information present in NGS primary repositories, a platform HIVE (High-performance Integrated Virtual Environment) for storing, analyzing, computing and curating NGS data and associated metadata has been developed. Using HIVE, 31 979 nsSNVs were identified in TCGA-derived NGS data from breast cancer patients. All variations identified through this process are stored in a Curated Short Read archive, and the nsSNVs from the tumor samples are included in BioMuta. Currently, BioMuta has 26 cancer types with 13 896 small-scale and 308 986 large-scale study-derived variations. Integration of variation data allows identifications of novel or common nsSNVs that can be prioritized in validation studies.Database URL: BioMuta: http://hive.biochemistry.gwu.edu/tools/biomuta/index.php; CSR: http://hive.biochemistry.gwu.edu/dna.cgi?cmd=csr; HIVE: http://hive.biochemistry.gwu.edu

Highlights

Evolving sequencing technologies have exponentially increased the output of genomics data [1, 2], which has led to revolutionary discoveries in cancer biology and other biological sciences [3,4,5]
It is important to note that UniProt curation effort is more comprehensive than just curating cancer biomarkers; we believe that our work extends the UniProt effort
We describe how time-tested curation of sequence features through reading papers supplemented with data integration from diverse sources and through the analysis of next-generation sequencing (NGS) data can help create a comprehensive curated database of cancer-related non-synonymous single-nucleotide variations (nsSNVs), which can be of immediate use to the community

Summary

Introduction

Evolving sequencing technologies have exponentially increased the output of genomics data [1, 2], which has led to revolutionary discoveries in cancer biology and other biological sciences [3,4,5]. Citation details: Wu,T.-J., Shamsaddini,A., Pan,Y., et al A framework for organizing cancer-related variations from existing databases, publications and NGS data using a High-performance Integrated Virtual Environment (HIVE).

Objectives

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A framework for organizing cancer-related variations from existing databases, publications and NGS data using a High-performance Integrated Virtual Environment (HIVE)

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Database

Lead the way for us

Similar Papers

COSMIC: the catalogue of somatic mutations in cancer
Nidhi Bindal ... Michael R Stratton
Genome Biology | VOL. 12
Nidhi Bindal, et. al.Nidhi Bindal ... Michael R Stratton
01 Jan 2010
Genome Biology | VOL. 12

Annotating Whole Genome Sequecing in COSMIC (The Catalogue of Somatic Mutations in Cancer)
C Y Kok ... S Bamford
Nature Precedings | VOL. -
C Y Kok, et. al.C Y Kok ... S Bamford
27 Oct 2010
Nature Precedings | VOL. -

Annotating Whole Genome Sequencing in COSMIC (The Catalogue of Somatic Mutations in Cancer)
D Breare ... C G Cole
Nature Precedings | VOL. -
D Breare, et. al.D Breare ... C G Cole
27 Oct 2010
Nature Precedings | VOL. -

Author response: Gallbladder adenocarcinomas undergo subclonal diversification and selection from precancerous lesions to metastatic tumors
...
-
, et. al. ...
20 Nov 2022
20 Nov 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A framework for organizing cancer-related variations from existing databases, publications and NGS data using a High-performance Integrated Virtual Environment (HIVE)

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Database