Abstract
The amount of data daily generated by different sources grows exponentially and brings new challenges to the information technology experts. The recorded data usually include heterogeneous attribute types, such as the traditional date, numerical, textual, and categorical information, as well as complex ones, such as images, videos, and multidimensional data. Simply posing similarity queries over such records can underestimate the semantics and potential usefulness of particular attributes. In this context, the Exploratory Data Analysis (EDA) technology is well-suited to understand data and perform knowledge extraction and visualization of existing patterns. In this paper, we propose Sketch+ , a technique and a corresponding supporting tool to compare electronic health records (provided by hospitals) by similarity, supporting correlation-based exploratory analysis over attributes of different types and allowing data preprocessing tasks for visualization and knowledge extraction. Sketch+ computes partial and overall data correlation considering distance spaces induced by the attributes. It employs both ANOVA and association rules with lift correlations to study relationships between variables, allowing extensive data analysis. Among the tools provided, a pixel-oriented one drives the analysts to observe visual correlations among dates, categorical and numerical attributes. As a running case study, we employed three open databases of COVID-19 cases, showing that specialists can benefit from the inference modules of Sketch+ to analyze electronic records. The study highlights how Sketch+ can be employed to spot strong correlations among tuples and attributes, with statistically significant results. The exploratory analysis has been shown to be an essential complement for similarity search tasks, identifying and evaluating patterns from heterogeneous attributes.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.