Abstract
A wide array of biomedical data are generated and made available to healthcare experts. However, due to the diverse nature of data, it is difficult to predict outcomes from it. It is therefore necessary to combine these diverse data sources into a single unified dataset. This paper proposes a global unified data model (GUDM) to provide a global unified data structure for all data sources and generate a unified dataset by a “data modeler” tool. The proposed tool implements user-centric priority based approach which can easily resolve the problems of unified data modeling and overlapping attributes across multiple datasets. The tool is illustrated using sample diabetes mellitus data. The diverse data sources to generate the unified dataset for diabetes mellitus include clinical trial information, a social media interaction dataset and physical activity data collected using different sensors. To realize the significance of the unified dataset, we adopted a well-known rough set theory based rules creation process to create rules from the unified dataset. The evaluation of the tool on six different sets of locally created diverse datasets shows that the tool, on average, reduces 94.1% time efforts of the experts and knowledge engineer while creating unified datasets.
Highlights
A successful decision support system relies on high quality information created either by a knowledge engineer or automatically generated from the data
This article describes the problem of fusing multiple heterogeneous datasets into a unified dataset for different types of high-level analysis, knowledge acquisition and reasoning
An expert-centric priority-based approach has been proposed and implemented as the “data modeler” tool. This application has an extensible framework with an easy to use GUI that allows knowledge engineers to import multiple heterogeneous datasets using its import manager and combines them together to obtain the unified dataset
Summary
A successful decision support system relies on high quality information created either by a knowledge engineer or automatically generated from the data. A huge volume of human-centric personal data is available but integrating them from various sources into a unified dataset is challenging. The integration of multiple heterogeneous data sources is an important research issue that is not limited to the healthcare arena. To enable the use of healthcare data in clinical decisions, automatic generation of a single unified dataset is desirable [1]. This task is very challenging due to a number of technical issues, such as semantic heterogeneity, different naming conventions, resolving attributes’ values conflicts, finding intrinsic relationships, handling missing values and overlapping information and converting local datasets into global unified data model [2,3]. This paper focuses on the last four challenges and leaves the rest as future work
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.