Abstract

Background: Designed to build upon Genome Wide Association Study (GWAS) findings, the NIH Common Fund’s Genotype-Tissue Expression (GTEx) project aims to study gene expression and regulation across multiple human tissues (30+ tissue types) from approximately 1000 healthy normal donors. It is expected to provide valuable insights into gene regulation and its tissue specificity, identify correlation between genetic variations and variations in gene expression levels as expression quantitative trait loci (eQTLs), and help to understand inherited susceptibility to diseases. Purpose/Objective: To meet the challenge of GTEx requirements for collecting and tracking high quality biospecimen samples, a custom-built software system named Comprehensive Data Resources (CDR) was developed to support sample collection work flow, clinical data entry, case management, and review and curation of study data. Materials and Methods: CDR is built with combination of technologies from Grails, Oracle, Groovy, jQuery, Apache Solr. Results: The CDR provides secure user access to case and sample data based on pre-defined roles and privileges. Personally Identifiable Information (PII) and Protected Health Information (PHI) are restricted to a limited data set (LDS) and to authorized users through dynamic content redaction. Intuitive graphic user interfaces for the Biopecimen Source Sites (BSS) streamline data entry workflow by strictly following SOPs for sample collection and processing. Contextual automated data checks and business rule validations confirm data integrity and SOP adherence simultaneously. Web services APIs allow the Pathology Resource Center to access digital imaging data from tissue slides housed remotely at the Comprehensive Biospecimen Resource (CBR). API’s connect to CBR’s LIMS systems for real-time sample inventory data. De-identified GTEx data is provided via a private API with the Broad Institute (LDACC) before the final release into dbGaP. The reporting and analytics module supports data analysis and aggregation, report generation and real-time operational data snapshots. Conclusions: CDR is a distributed web-based system designed to support GTEx operation from pilot phase to full scale-up stage. It manages and maintains multi-dimensional data models around each donor case (average 500+ data elements/case). As an efficient case management tool capable of connecting to various remote informatics systems, CDR could be adapted to the broader biobanking community with the flexibility of building user-defined work flows in the system.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call