Abstract

Abstract Genomic data sharing is increasingly recognized as critical to genomic research. The need is acute in pediatric cancer research due to the rarity of pediatric tumor types and paucity of pediatric cancer data, and in translational research to assess the impact of genomic research on human health. However, genomic data sharing is hindered by an absence of standards regarding timing, patient privacy, use agreement standards, and data characterization and quality. At UC Santa Cruz Treehouse Childhood Cancer Initiative (treehousegenomics.soe.ucsc.edu), we examine individual pediatric cancer tumor RNA sequencing profiles against a database of over 11,000 tumor RNA sequencing profiles from public genomic datasets such as The Cancer Genome Atlas, Therapeutically Applicable Research To Generate Effective Treatments, International Cancer Genome Consortium, and Medulloblastoma Advanced Genomics International, and pediatric cancer clinical trials with which we partner, such as those at Stanford University, UC San Francisco, Children’s Hospital of Orange County, and British Columbia Children’s Hospital. For over 18 months, we have worked systematically to enhance the Treehouse dataset by adding pediatric cancer data and presently underrepresented tumor types. The NIH and other leading funding agencies now regularly require grantees to make genomic data generated available to the research community, either post-publication or after an embargo period. We have combed websites and public repositories, searched PubMed, and contacted researchers directly. Finding data requires a mining of literature, often with limited information, and initiating the many different processes for requesting permission for these datasets, with different and often cumbersome data use obligations. The combination of cryptically named datasets, multiple data types and the practice of grouping datasets from multiple papers under a single study accession makes zeroing in on the correct dataset challenging. Downloading the genomic data is time-consuming, such that a dataset of under a 100 files can take up to a week to download under optimal conditions. Matching metadata is inconsistently available, often vague, sparse or error ridden. Only after months of identifying, permissioning for use, committing to use- and sharing-restricting terms, and downloading the genomic and metadata, is it possible to assess the quality, often discovering that data quality is low. We evaluate the barriers to data sharing based on the Treehouse experience and offer guidelines for timing, use agreement standards, and data characterization and quality, to enhance data sharing and outcomes for all pediatric cancer patients. Citation Format: Katrina Learned, Ann Durbin, Robert Currie, Holly Beale, Du Linh Lam, Theodore Goldstein, Sofie R. Salama, David Haussler, Olena Morozova, Isabel Bjork. A critical evaluation of genomic data sharing: Barriers to accessing pediatric cancer genomic datasets: a Treehouse Childhood Cancer Initiative experience [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2017; 2017 Apr 1-5; Washington, DC. Philadelphia (PA): AACR; Cancer Res 2017;77(13 Suppl):Abstract nr LB-338. doi:10.1158/1538-7445.AM2017-LB-338

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call