Abstract LB-308: A novel data safe haven approach to bring analyses to the International Cancer Genome Consortium data

Francisco M De La Vega,Thomas Schlumpberger,Akshay Patel,Raja Hayek,Tal Shmaya,James Wiley,Ying Wu

doi:10.1158/1538-7445.am2015-lb-308

Abstract

Abstract To target and personalize cancer therapies to the genomic aberrations present in a particular patient's tumor, researchers need to identify the genes that drive the progression of malignant tumors. This requires analysis of somatic mutations from large samples of patients to identify driver mutations up to the “tail end” of the frequency distribution. Community genomics data sets from the TCGA and ICGC projects represent a valuable resource to which researchers can add their own data to gain statistical power in their analyzes. The current issue to this methodology is the highly fragmented storage of public and private data and the inefficient access to public data. Researchers spend weeks to months downloading hundreds of terabytes of data from central repositories before computations can begin. What is needed is a data “safe haven” where researchers can bring compute to the reference data without the need to incur in bulky data transfers or duplicative storage costs, in an environment that protects the privacy of the patients’ data. In collaboration with the International Cancer genome Consortium, we developed ShareSeq, a genomic data safe haven platform that provides an informatics solution for storing, handling and analyzing protected identifiable genomic data. This resource leverages Annai-GNOS, the technology which we developed to create and manage the CGHub TCGA repository together with UCSC, and that is being used in the ICGC Pan Cancer Analysis of Whole Genomes project, and combines it with a high-performance compute environment and an array of tools to process and analyze genomic data. Built using a walled garden approach, where the data is stored, processed and managed within the security of the system, ShareSeq avoids the complexity of assured end point encryption. GeneTorrent, our fast and secure file transfer mechanism, enables researchers’ private information to be transferred into the walled garden simply and securely to combine it with the public datasets. ShareSeq differs dramatically from the traditional cloud in two features: (i) formal mechanisms and a service level agreement to store protected identifiable genomic data securely and safely, built into the system from the ground up; (ii) the system is specifically designed for genomic computing over large shared data sets supporting common bioinformatics workflow tools; (iii) Fast download and access to raw genomic information and its metadata; and (iv) access controls leveraging federated authentication systems that Data Access Committees utilize to authorize access to the restricted data. ShareSeq is initially hosting raw, normalized, and processed data from the ICGC, but we envision that over time it will host an increasing number of high value reference genomic public datasets and add standards-based interfaces promoted by the Global Alliance of Genomes and Health to allow broader data discovery and sharing. Citation Format: Francisco M. De La Vega, Ying Wu, Tal Shmaya, Thomas Schlumpberger, James Wiley, Akshay Patel, Raja Hayek. A novel data safe haven approach to bring analyses to the International Cancer Genome Consortium data. [abstract]. In: Proceedings of the 106th Annual Meeting of the American Association for Cancer Research; 2015 Apr 18-22; Philadelphia, PA. Philadelphia (PA): AACR; Cancer Res 2015;75(15 Suppl):Abstract nr LB-308. doi:10.1158/1538-7445.AM2015-LB-308

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Abstract LB-308: A novel data safe haven approach to bring analyses to the International Cancer Genome Consortium data

Abstract

Talk to us

Similar Papers

More From: Cancer Research

Lead the way for us

Journal: Cancer Research	Publication Date: Aug 1, 2015
Citations: 1

Similar Papers

Abstract 51: International Cancer Genome Consortium Data Portal – A ‘one-stop-shop’ for genomic, transcriptomic, and epigenomic data
Arek Kasprzyk
Cancer Research | VOL. 71
Arek KasprzykArek Kasprzyk
15 Apr 2011
Cancer Research | VOL. 71

Abstract 130: International Cancer Genome Consortium (ICGC)
Jennifer L Jennings ... Thomas J Hudson
Cancer Research | VOL. 76
Jennifer L Jennings, et. al.Jennifer L Jennings ... Thomas J Hudson
15 Jul 2016
Cancer Research | VOL. 76

Abstract 378: The Cancer Genome Collaboratory
Christina K Yung ... Cenk Sahinalp
Cancer Research | VOL. 77
Christina K Yung, et. al.Christina K Yung ... Cenk Sahinalp
01 Jul 2017
Cancer Research | VOL. 77

Abstract 2602: The ICGC data portal and its underlying open source software architecture
Junjun Zhang ... Phuong-My Do
Cancer Research | VOL. 77
Junjun Zhang, et. al.Junjun Zhang ... Phuong-My Do
01 Jul 2017
Abstract 2602: The ICGC data portal and its underlying open source software architecture
Junjun Zhang ... Phuong-My Do

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Abstract LB-308: A novel data safe haven approach to bring analyses to the International Cancer Genome Consortium data

Abstract

Talk to us

Similar Papers

More From: Cancer Research