OpenDataPsy: An Open-Data Repository with Standardized Storage and Description for Research in Psychiatry

Chloé Saint-Dizier,Majda Zaanouar,Paul Quindroit,Alina Amariei,Antoine Lamer

doi:10.3233/shti230288

Abstract

Sharing health data could avoid duplication of effort in data collection, reduce unnecessary costs in future studies, and encourage collaboration and data flow within the scientific community. Several repositories from national institutions or research teams have making their datasets available. These data are mainly aggregated at spatial or temporal level, or dedicated to a specific field. The objective of this work is to propose a standardized storage and description of open datasets for research purposes. For this, we selected 8 publicly accessible datasets, covering the fields of demographics, employment, education and psychiatry. Then, we studied the format, nomenclature (i.e., files and variables names, modalities of recurrent qualitative variables) and descriptions of these datasets and we proposed on common and standardized format and description. We made available these datasets in an open gitlab repository. For each dataset, we proposed the raw data file in its original format, the cleaned data file in csv format, the variables description, the data management script and the descriptive statistics. Statistics are generated according to the type of variables previously documented. After one year of use, we will evaluate with the users if the standardization of the data sets is relevant and how they use the dataset in real life.

Full Text