Abstract
The Core Scientific Dataset (CSD) model with JavaScript Object Notation (JSON) serialization is presented as a lightweight, portable, and versatile standard for intra- and interdisciplinary scientific data exchange. This model supports datasets with a p-component dependent variable, {U0, …, Uq, …, Up−1}, discretely sampled at M unique points in a d-dimensional independent variable (X0, …, Xk, …, Xd−1) space. Moreover, this sampling is over an orthogonal grid, regular or rectilinear, where the principal coordinate axes of the grid are the independent variables. It can also hold correlated datasets assuming the different physical quantities (dependent variables) are sampled on the same orthogonal grid of independent variables. The model encapsulates the dependent variables’ sampled data values and the minimum metadata needed to accurately represent this data in an appropriate coordinate system of independent variables. The CSD model can serve as a re-usable building block in the development of more sophisticated portable scientific dataset file standards.
Highlights
A frustrating and common problem faced by scientists in many disciplines is the lack of a portable scientific dataset format and universal standards for exchanging and archiving multidimensional datasets—both experimental and computational
Of particular importance in the Core Scientific Dataset (CSD) model is the ScalarQuantity type, which is composed of a numerical value and any valid SI unit symbol or any number of accepted non-SI unit symbols
The description attribute appears in nearly every CSD model object and holds a UTF-8 allowed string describing the instance of the model object
Summary
A frustrating and common problem faced by scientists in many disciplines is the lack of a portable scientific dataset format and universal standards for exchanging and archiving multidimensional datasets—both experimental and computational. As a result of such risks and incompatibilities, many scientists resort to using comma-separated values (CSV) files for dataset exchange and archival. Other scientists resort to specialized library packages to import datasets from the vendor-specific file formats into their favorite programming languages such as Matlab, Python, R, Java, or use the third-party software for dataset imports. This is only a temporary fix since it just delays the original problem as the dataset files are translated to yet another third-party software or user-specific file-format—and again, often with metadata loss. We envision the CSD model as a re-usable building block in a hierarchical description of more sophisticated portable scientific dataset file standards
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.