Research on user errors in retrieving information from SQL databases has focused on erroneous syntax in the query language and erroneous semantics concerning the data model. In the present paper, we investigate a third source of error, namely erroneous aggregations that break the limitations imposed by the numerical properties of the data. An erroneous aggregation might arise because of the SQL programmer’s misunderstanding concerning those numerical properties, or because of a simple mistake. We show that for database queries in the SQL language, significant classes of erroneous aggregations can be detected by non-intrusive, off-line checking, using only a simple set of metadata rules that can be supplied by the data provider. We have implemented software that performs static checks on users’ SQL queries, looking for evidence of misunderstandings concerning the measurement properties of the numerical data.
Read full abstract