Data vocalization

Immanuel Trummer,Mark Bryan,Jiancheng Zhu

doi:10.14778/3137628.3137663

Abstract

Research on data visualization aims at finding the best way to present data via visual interfaces. We introduce the complementary problem of "data vocalization". Our goal is to present relational data in the most efficient way via voice output. This problem setting is motivated by emerging tools and devices (e.g., Google Home, Amazon Echo, Apple's Siri, or voice-based SQL interfaces) that communicate data primarily via audio output to their users. We treat voice output generation as an optimization problem. The goal is to minimize speaking time while transmitting an approximation of a relational table to the user. We consider constraints on the precision of the transmitted data as well as on the cognitive load placed on the listener. We formalize voice output optimization and show that it is NP-hard. We present three approaches to solve that problem. First, we show how the problem can be translated into an integer linear program which enables us to apply corresponding solvers. Second, we present a two-phase approach that forms groups of similar rows in a pre-processing step, using a variant of the apriori algorithm. Then, we select an optimal combination of groups to generate a speech. Finally, we present a greedy algorithm that runs in polynomial time. Under simplifying assumptions, we prove that it generates near-optimal output by leveraging the sub-modularity property of our cost function. We compare our algorithms experimentally and analyze their complexity.

Full Text