Abstract
With the move toward global, Internet enabled science there is an inherent need to capture, store, aggregate and search scientific data across a large corpus of heterogeneous data silos. As a result, standards development is needed to create an infrastructure capable of representing the diverse nature of scientific data. This paper describes a fundamental data model for scientific data that can be applied to data currently stored in any format, and an associated ontology that affords semantic representation of the structure of scientific data (and its metadata), upon which discipline specific semantics can be applied. Application of this data model to experimental and computational chemistry data are presented, implemented using JavaScript Object Notation for Linked Data. Full examples are available at the project website (Chalk in SciData: a scientific data model. http://stuchalk.github.io/scidata/, 2016).
Highlights
For almost 40 years, scientists have been storing scientific data on computers
This paper describes a generic scientific data model (SDM)/framework for scientific data derived from (1) the common structure of scientific articles, (2) the needs of electronic notebooks to capture scientific research data and metadata, and (3) the clear need to organize scientific data and its contextual descriptors
Considerations for a scientific data model What is scientific data? In order to appreciate what scientific data is we took a step back and looked at the scientific process to abstract the important aspects that underpin the framework of what scientists do and how they do it
Summary
For almost 40 years, scientists have been storing scientific data on computers. With the advancement of Internet technologies and online and local storage capabilities, the options for collecting and stored scientific information have become unlimited. With all these advancements science faces an increasingly important issue of interoperability. Though the Internet has promoted the creation of open standards in many areas, scientific data has, in a sense, been left behind because of its inherent complexity. The problem is the contextualization of the scientific data—the metadata that describes system that it applies to, the way it was investigated, the scientists that determined it, and the quality of the measurements
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have