Abstract

This paper presents a set of analyses aiming at better understanding the SQLShare workload Jain et al. (2016) and learning users’ analysis behavior. SQLShare is a database-as-a-service platform targeting scientists and data scientists with minimal database experience, whose workload was made available to the research community. According to the authors of Jain et al. (2016) , this workload is the only one containing primarily ad-hoc hand-written queries over user-uploaded datasets. In this paper we analyze this workload, by comparing users’ explorations (sequences of queries), looking for common SQL operations performed by the users during data analysis and studying query complexity. We use a clustering algorithm to retrieve groups of similar explorations and we analyze the obtained clusters through many statistical and visual indicators for explaining analysis patterns inside clusters. To our knowledge, this is the first attempt to characterize human analysis behavior in SQL workloads.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call