Abstract SY31-03: Reuse of genomics data and lessons for cancer research

Alvis Brazma

doi:10.1158/1538-7445.am2018-sy31-03

Abstract

Abstract Making data and materials used in scientific research available to others is not only a prerequisite for enabling reproducible science and maximizing the return on funding, it is also the very basis of scientific progress, allowing other scientists to build on the previous work of their colleagues (1). In life sciences this principle has been well established since the 1980s, when scientific journals started making it a requirement, that X-ray crystallography and DNA sequence data supporting publications must be deposited in appropriate databases. In 1996, as a part of the Human Genome Project, it was agreed that all the sequence data would be released in publicly accessible databases within twenty-four hours after generation, known as Bermuda principles (2). With advent of new types of data, such as microarray data, it was soon recognized that not only the sequence or microarray data are important, but also the standards how these data are represented and the information about the samples and experiments (3). However, to enable data sharing, a properly built and funded infrastructure is needed (4). Without public data resources, such as Ensembl, UniProt, and Expression Atlas, that add value to molecular data archives, modern life sciences research would be hard to imagine (5-7). The data-sharing mentality is now so firmly embedded in the ethos of life sciences that scientists working in the field struggle to imagine that in other science disciplines the mentality may be different. Data sharing in medical research is a more complex and difficult problem due to multiple reasons, including the need for data security and the patient's confidentiality, complexity of representing health records, and diversity of national legislations. However, there is also an increasing realization that sharing biomedical research data, which is now facilitated by the use of electronic health records, can be an important accelerator of biomedical research (7). Cancer researchers are at the forefront of the data-sharing approach in biomedical research. The International Cancer Genome Consortium is completing the sequencing of genomes, transcriptomes, and epigenomes of over 20,000 patients, making all the data alongside the essential clinical information available to researchers (8). About 10% of these genomes and transcriptomes have been reanalyzed in a standardized way by the Pan-cancer Analysis of the Whole Genomes (PCAWG) group. The PCAWG project provides an important demonstration how data integration can accelerate biomedical research. In this talk I will particularly concentrate on lessons learned from integration of genome, transcriptome, and clinical data of this project. Although at the time of writing this abstract, some of the PCAWG transcriptome analysis is still being finalized, it is already clear that integrating heterogeneous types of data from heterogeneous cancer types provides new insights about this disease, and also that such integrative analysis is challenging (9-13). I will also describe some of the experience in building infrastructure for data integration and sharing at the European Bioinformatics Institute (EMBL-EBI) and some of the data resources provided by EMBL-EBI (14,15), particularly concentrating on cancer genomics data and the benefits that data sharing brings to cancer research.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Abstract SY31-03: Reuse of genomics data and lessons for cancer research

Abstract

Talk to us

Similar Papers

More From: Cancer Research

Lead the way for us

Similar Papers

Genomic and proteomic data integration for comprehensive biodata search
Arif Canakoglu ... Marco Masseroli
EMBnet.journal | VOL. 18
Arif Canakoglu, et. al.Arif Canakoglu ... Marco Masseroli
09 Nov 2012
EMBnet.journal | VOL. 18

NCBI GEO standards and services for microarray data
Ron Edgar ... Tanya Barrett
Nature Biotechnology | VOL. 24
Ron Edgar, et. al.Ron Edgar ... Tanya Barrett
01 Dec 2006
Nature Biotechnology | VOL. 24

Analysis of Complex Disease Association and Linkage Studies Using the University of California Santa Cruz Genome Browser
Tianyuan Wang ... Terrence S Furey
Circulation: Cardiovascular Genetics | VOL. 2
Tianyuan Wang, et. al.Tianyuan Wang ... Terrence S Furey
01 Apr 2009
Circulation: Cardiovascular Genetics | VOL. 2

Microarray Standards
Lynne Lederman
BioTechniques | VOL. 38
Lynne LedermanLynne Lederman
01 Jan 2004
BioTechniques | VOL. 38

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Abstract SY31-03: Reuse of genomics data and lessons for cancer research

Abstract

Talk to us

Similar Papers

More From: Cancer Research