Standardized Description of the Feature Extraction Process to Transform Raw Data Into Meaningful Information for Enhancing Data Reuse: Consensus Study

Antoine Lamer,Marc Cuggia,Emmanuel Chazard,Thibaut Balcaen,Matthieu Doutreligne,William Gacquer,Mathilde Fruchart,Benjamin Popoff,Anaïs Payen,Nicolas Paris,Guillaume Bouzillé

doi:10.2196/38936

Abstract

Despite the many opportunities data reuse offers, its implementation presents many difficulties, and raw data cannot be reused directly. Information is not always directly available in the source database and needs to be computed afterwards with raw data for defining an algorithm. The main purpose of this article is to present a standardized description of the steps and transformations required during the feature extraction process when conducting retrospective observational studies. A secondary objective is to identify how the features could be stored in the schema of a data warehouse. This study involved the following 3 main steps: (1) the collection of relevant study cases related to feature extraction and based on the automatic and secondary use of data; (2) the standardized description of raw data, steps, and transformations, which were common to the study cases; and (3) the identification of an appropriate table to store the features in the Observation Medical Outcomes Partnership (OMOP) common data model (CDM). We interviewed 10 researchers from 3 French university hospitals and a national institution, who were involved in 8 retrospective and observational studies. Based on these studies, 2 states (track and feature) and 2 transformations (track definition and track aggregation) emerged. "Track" is a time-dependent signal or period of interest, defined by a statistical unit, a value, and 2 milestones (a start event and an end event). "Feature" is time-independent high-level information with dimensionality identical to the statistical unit of the study, defined by a label and a value. The time dimension has become implicit in the value or name of the variable. We propose the 2 tables "TRACK" and "FEATURE" to store variables obtained in feature extraction and extend the OMOP CDM. We propose a standardized description of the feature extraction process. The process combined the 2 steps of track definition and track aggregation. By dividing the feature extraction into these 2 steps, difficulty was managed during track definition. The standardization of tracks requires great expertise with regard to the data, but allows the application of an infinite number of complex transformations. On the contrary, track aggregation is a very simple operation with a finite number of possibilities. A complete description of these steps could enhance the reproducibility of retrospective studies.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: JMIR Medical Informatics	Publication Date: Oct 17, 2022
Citations: 3	License type: cc-by

R Discovery Prime

R Discovery Prime

Standardized Description of the Feature Extraction Process to Transform Raw Data Into Meaningful Information for Enhancing Data Reuse: Consensus Study

Abstract

Talk to us

Similar Papers

More From: JMIR Medical Informatics

Lead the way for us

Similar Papers

Integrating Clinical Data and Medical Imaging in Lung Cancer: Feasibility Study Using the Observational Medical Outcomes Partnership Common Data Model Extension.
Hyerim Ji ... Sooyoung Yoo
JMIR medical informatics | VOL. 12
Hyerim Ji, et. al.Hyerim Ji ... Sooyoung Yoo
12 Jul 2024
JMIR medical informatics | VOL. 12

Transforming Anesthesia Data Into the Observational Medical Outcomes Partnership Common Data Model: Development and Usability Study.
Antoine Lamer ... Benjamin Popoff
Journal of Medical Internet Research | VOL. 23
Antoine Lamer, et. al.Antoine Lamer ... Benjamin Popoff
29 Oct 2021
Journal of Medical Internet Research | VOL. 23

Assessing the Use of German Claims Data Vocabularies for Research in the Observational Medical Outcomes Partnership Common Data Model: Development and Evaluation Study.
Elisa Henke ... Michéle Zoch
JMIR Medical Informatics | VOL. 11
Elisa Henke, et. al.Elisa Henke ... Michéle Zoch
07 Nov 2023
JMIR Medical Informatics | VOL. 11

Intensive Care Quality Indicators Dashboard Using Observational Medical Outcomes Partnership Common Data Model.
Boris Delange ... Marc Cuggia
Studies in health technology and informatics | VOL. 316
Boris Delange, et. al.Boris Delange ... Marc Cuggia
22 Aug 2024
Studies in health technology and informatics | VOL. 316

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Standardized Description of the Feature Extraction Process to Transform Raw Data Into Meaningful Information for Enhancing Data Reuse: Consensus Study

Abstract

Talk to us

Similar Papers

More From: JMIR Medical Informatics