Abstract

Genomic Epidemiology (genEpi) is a branch of public health that uses many different data types including tabular, network, genomic, and geographic, to identify and contain outbreaks of deadly diseases. Due to the volume and variety of data, it is challenging for genEpi domain experts to conduct data reconnaissance; that is, have an overview of the data they have and make assessments toward its quality, completeness, and suitability. We present an algorithm for data reconnaissance through automatic visualization recommendation, GEViTRec. Our approach handles a broad variety of dataset types and automatically generates visually coherent combinations of charts, in contrast to existing systems that primarily focus on singleton visual encodings of tabular datasets. We automatically detect linkages across multiple input datasets by analyzing non-numeric attribute fields, creating a data source graph within which we analyze and rank paths. For each high-ranking path, we specify chart combinations with positional and color alignments between shared fields, using a gradual binding approach to transform initial partial specifications of singleton charts to complete specifications that are aligned and oriented consistently. A novel aspect of our approach is its combination of domain-agnostic elements with domain-specific information that is captured through a domain-specific visualization prevalence design space. Our implementation is applied to both synthetic data and real Ebola outbreak data. We compare GEViTRec's output to what previous visualization recommendation systems would generate, and to manually crafted visualizations used by practitioners. We conducted formative evaluations with ten genEpi experts to assess the relevance and interpretability of our results. Code, Data, and Study Materials Availability: https://github.com/amcrisan/GEVitRec.

Highlights

  • D ATA reconnaissance is the process of exploring a group of datasets that are not yet understood by a specific person; we recently defined data recon and proposed an iterative four-stage process of acquire, view, assess, pursue as a conceptual framework to reason about it [1]

  • The relevance criteria that we propose include broad coverage of input datasets and of different data types, and information from the domainspecific visualization prevalence design space to prioritize visual encodings commonly used in the domain

  • The solution we propose in GEViTRec is to manually analyze the full set of charts used in the domain design space, to determine viable combinations where it may be possible to establish a shared axis

Read more

Summary

Introduction

D ATA reconnaissance is the process of exploring a group of datasets that are not yet understood by a specific person; we recently defined data recon and proposed an iterative four-stage process of acquire, view, assess, pursue as a conceptual framework to reason about it [1]. We posit that visualization recommendation systems have great promise for operationalizing the goal of data recon through a concrete algorithm that quickly and automatically computes reasonable visual encodings with minimal input from a user. A recommender system could speed up the view stage of the data recon process, in contrast to any kind of design or selection process for visual encoding that involves human judgement. After the recon process concludes when an appropriate collection of datasets has been acquired, a more lengthly analysis process with traditional investigative exploration visualization tools could occur

Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.