Core Scientific Dataset Model: A lightweight and portable model and file format for multi-dimensional scientific data

Deepansh J Srivastava,Philip J Grandinetti,Dominique Massiot,Thomas Vosegaard

doi:10.1371/journal.pone.0225953

Abstract

The Core Scientific Dataset (CSD) model with JavaScript Object Notation (JSON) serialization is presented as a lightweight, portable, and versatile standard for intra- and interdisciplinary scientific data exchange. This model supports datasets with a p-component dependent variable, {U0, …, Uq, …, Up−1}, discretely sampled at M unique points in a d-dimensional independent variable (X0, …, Xk, …, Xd−1) space. Moreover, this sampling is over an orthogonal grid, regular or rectilinear, where the principal coordinate axes of the grid are the independent variables. It can also hold correlated datasets assuming the different physical quantities (dependent variables) are sampled on the same orthogonal grid of independent variables. The model encapsulates the dependent variables’ sampled data values and the minimum metadata needed to accurately represent this data in an appropriate coordinate system of independent variables. The CSD model can serve as a re-usable building block in the development of more sophisticated portable scientific dataset file standards.

Highlights

A frustrating and common problem faced by scientists in many disciplines is the lack of a portable scientific dataset format and universal standards for exchanging and archiving multidimensional datasets—both experimental and computational
Of particular importance in the Core Scientific Dataset (CSD) model is the ScalarQuantity type, which is composed of a numerical value and any valid SI unit symbol or any number of accepted non-SI unit symbols
The description attribute appears in nearly every CSD model object and holds a UTF-8 allowed string describing the instance of the model object

Summary

Introduction

A frustrating and common problem faced by scientists in many disciplines is the lack of a portable scientific dataset format and universal standards for exchanging and archiving multidimensional datasets—both experimental and computational. As a result of such risks and incompatibilities, many scientists resort to using comma-separated values (CSV) files for dataset exchange and archival. Other scientists resort to specialized library packages to import datasets from the vendor-specific file formats into their favorite programming languages such as Matlab, Python, R, Java, or use the third-party software for dataset imports. This is only a temporary fix since it just delays the original problem as the dataset files are translated to yet another third-party software or user-specific file-format—and again, often with metadata loss. We envision the CSD model as a re-usable building block in a hierarchical description of more sophisticated portable scientific dataset file standards

Overview of CSD model

UML class diagram

CSDM object

Dimension object

DependentVariable object

Generic application objects—Beyond the CSD model

JSON file-serialization

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLOS ONE	Publication Date: Jan 2, 2020
Citations: 12	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Core Scientific Dataset Model: A lightweight and portable model and file format for multi-dimensional scientific data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE

Lead the way for us

Similar Papers

Inferential Statistics
Shane Allua ... Cheryl Bagley Thompson
Air Medical Journal | VOL. 28
Shane Allua, et. al.Shane Allua ... Cheryl Bagley Thompson
30 Jun 2009
Air Medical Journal | VOL. 28

Linear regression.
Nikolaos Pandis
American Journal of Orthodontics and Dentofacial Orthopedics | VOL. 149
Nikolaos PandisNikolaos Pandis
01 Mar 2016
American Journal of Orthodontics and Dentofacial Orthopedics | VOL. 149

Comparison of JSON and XML Data Formats in Document Stored NoSql Database Replication Processes
Rianto ... Irfan Darmawan
International Journal on Advanced Science, Engineering and Information Technology | VOL. 11
Rianto, et. al. Rianto ... Irfan Darmawan
25 Jun 2021
International Journal on Advanced Science, Engineering and Information Technology | VOL. 11

Clinical features of vitiligo associated with comorbid autoimmune disease: A prospective survey
Jonathan I Silverberg ... Nanette B Silverberg
Journal of the American Academy of Dermatology | VOL. 69
Jonathan I Silverberg, et. al.Jonathan I Silverberg ... Nanette B Silverberg
11 Oct 2013
Journal of the American Academy of Dermatology | VOL. 69

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Core Scientific Dataset Model: A lightweight and portable model and file format for multi-dimensional scientific data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE