Abstract

Next generation sequencing multi-gene panels have greatly improved the diagnostic yield and cost effectiveness of genetic testing and are rapidly being integrated into the clinic for hereditary cancer risk. With this technology comes a dramatic increase in the volume, type and complexity of data. This invaluable data though is too often buried or inaccessible to researchers, especially to those without strong analytical or programming skills. To effectively share comprehensive, integrated genotypic–phenotypic data, we built Color Data, a publicly available, cloud-based database that supports broad access and data literacy. The database is composed of 50 000 individuals who were sequenced for 30 genes associated with hereditary cancer risk and provides useful information on allele frequency and variant classification, as well as associated phenotypic information such as demographics and personal and family history. Our user-friendly interface allows researchers to easily execute their own queries with filtering, and the results of queries can be shared and/or downloaded. The rapid and broad dissemination of these research results will help increase the value of, and reduce the waste in, scientific resources and data. Furthermore, the database is able to quickly scale and support integration of additional genes and human hereditary conditions. We hope that this database will help researchers and scientists explore genotype–phenotype correlations in hereditary cancer, identify novel variants for functional analysis and enable data-driven drug discovery and development.

Highlights

  • Generation sequencing (NGS) technologies continue to revolutionize the field of genomics as low-cost, highthroughput platforms with high sensitivity

  • Users can select filter values in the dropdown list or by text typing with autocomplete, with the exception of the ‘Variant’ filter values that can only be selected by text typing with autocomplete using Human Genome Variation Society (HGVS) nomenclature

  • VUS, variant of uncertain significance. aUnknown includes information not reported. bThe CDKN2A locus encodes two gene products, p14ARF and p16INK4a. cFilter values for ‘Variant’ can only be selected by text typing with autocomplete using HGVS nomenclature

Read more

Summary

Introduction

Generation sequencing (NGS) technologies continue to revolutionize the field of genomics as low-cost, highthroughput platforms with high sensitivity. Over the past few years, NGS multi-gene panels have been increasingly used in both the clinic and research laboratories for genetic screening, diagnosis and assessment of hereditary conditions, including cancer [1,2,3]. The study of genomic data in these cases can help reveal genotype– phenotype correlations in hereditary cancer, identify novel variants for functional analysis and enable data-driven drug discovery and development. As well as public and commercial cancer-specific databases, have been developed for genomic data and provide useful information on gene annotation, allele frequency and known or predicted functional consequences of variants [8,9,10,11]. Associated specific clinical information, such as demographics and personal and family history, is not always available, and independently linking large sets of genotypic and phenotypic information often require knowledge of programming languages and database intelligence or expensive local software

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call