Abstract

Public transcriptomic assets in the nuclear receptor (NR) signaling field hold considerable collective potential for exposing underappreciated aspects of NR regulation of gene expression. This potential is undermined however by a series of enduring informatic pain points that retard the routine re-use of these datasets. Here we describe a coordinated biocuration and web development approach to redress this situation that is closely aligned with ideals articulated in the FAIR (findable, accessible, interoperable, re-usable) principles on data stewardship. To improve findability, biocurators engage authors of studies in collaborating journals to secure datasets for deposition in public archives. Annotated derivatives of the archived datasets are assigned digital object identifiers and regulatory molecule identifiers that support persistent linkages between datasets and their associated research articles, integration in relevant records in gene and small molecule knowledgebases, and indexing by dataset search engines. To enhance their accessibility and interoperability, datasets are visualizable in responsively designed web pages, retrievable in machine-readable spreadsheets, or through an application programming interface. Re-use of the datasets is supported by their interrogation as a universe of data points through the Transcriptomine search engine, highlighting transcriptional intersections between NR signaling pathways, physiological processes and disease states. We illustrate the value of our approach in connecting disparate research communities using a use case of persistent interoperability between the Nuclear Receptor Signaling Atlas and the Pharmacogenomics Knowledgebase. Our FAIR-aligned model demonstrates the enduring value of discovery-scale datasets that accrues from their systematic compilation, biocuration and distribution across the digital biomedical research enterprise.

Highlights

  • Signal transduction by members of the nuclear receptor (NR) superfamily of transcription factors encompasses interactions with small molecule ligands and coregulators that control cell- and tissue-specific transcriptomes in a wide variety of developmental and physiological contexts (McKenna and O’Malley, 2002, Mangelsdorf et al, 1995)

  • Datasets are not consistently archived (Ochsner et al, 2008) and those that are archived in repositories such as the US National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) (Barrett et al, 2009) and European Bioinformatics Institute (EBI) ArrayExpress (Kolesnikov et al, 2015) are frequently under-annotated and poorly exposed for discovery by researchers

  • End users expressed satisfaction with the site overall, they requested an increased emphasis on ‘omics dataset integration and analysis tooling. Given that that they represent the most abundant ‘omics modality in the field of nuclear receptor signaling we set out on a systematic effort to enhance the re-use of transcriptomic datasets in the field

Read more

Summary

Introduction

Signal transduction by members of the nuclear receptor (NR) superfamily of transcription factors encompasses interactions with small molecule ligands and coregulators that control cell- and tissue-specific transcriptomes in a wide variety of developmental and physiological contexts (McKenna and O’Malley, 2002, Mangelsdorf et al, 1995). What NR pathways regulate my gene of interest? What genes are most consistently regulated by a given NR pathway, and how do these targets differ between different tissues? Ochsner et al: A FAIR-Based Approach to Enhancing the Discovery and Re-Use of Transcriptomic Data Assets for Nuclear Receptor Signaling Pathways profiling datasets involving perturbations of NR signaling pathways, numerous factors combine to complicate re-use of these datasets to answer these and other biological questions. Datasets are not consistently archived (Ochsner et al, 2008) and those that are archived in repositories such as the US National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) (Barrett et al, 2009) and European Bioinformatics Institute (EBI) ArrayExpress (Kolesnikov et al, 2015) are frequently under-annotated and poorly exposed for discovery by researchers. Apart from the wasted effort, the current period of financial austerity in research funding makes a strong case for the development of tools that will provide for more effective and efficient use of already existing, but currently peripheral, data points

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.