Abstract

e18094 Background: Gen3 is an open source software platform for developing and operating data commons. Gen3 systems are now used by a variety of institutions and agencies to share and analyze large biomedical datasets including clinical and genomic data. One of the challenges of working with these datasets is disparate clinical data standards used by researchers across different studies and fields. We have worked to address these hurdles in a variety of ways. Methods: Gen3 is an open source software platform for developing and operating data commons. Detailed specification and features can be found at https://gen3.org/ with code located on GitHub ( https://github.com/UC-cdis ). Results: The Gen3 data model is a graphical representation of the different nodes or classes of data that have been collected. Examples include diagnosis, demographic, exposure, and family history. The properties and values on each node are controlled by the data dictionary specified by the data commons creator. While each commons may have a unique data model and dictionary, specifying external standards allows for easier submission of new data and assists data consumers with interpretation of results. A variety of external references can be supported, but here we demonstrate the use of the National Cancer Institute Thesaurus (NCIt). NCIt provides reference terminologies and biomedical standards that contain a rich set of terms, codes, definitions, and concepts. Using the same reference standards across commons allows for the export of clinical data between commons. The Portable Format for Biomedical Data (PFB) was created to facilitate data export and to allow the data dictionary schema as well as the raw data to be compressed and exported. This new file format, which utilizes an Avro serialization, is small, fast, easy to modify, and enables simple data export and import. PFB also has the ability to house entire external reference ontologies and it is easy to update the PFB references as changes are introduced. Conclusions: We have shown here how the Gen3 data model, use of external reference standards for clinical data, and the export/import format of PFB enable the harmonization of clinical data across different data commons.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call