Abstract

In general, there is a significant amount of data mining analysis performed outside a database system, which creates many data management issues. This article presents a summary of our experience and recommendations to compute data set preprocessing and transformation inside a database system (i.e. data cleaning, record selection, summarization, denormalization, variable creation, coding), which is the most time-consuming task in data mining projects. This aspect is largely ignored in the literature. We present practical issues, common solutions and lessons learned when preparing and transforming data sets with the SQL language, based on experience from real-life projects. We then provide specific guidelines to translate programs written in a traditional programming language into SQL statements. Based on successful real-life projects, we present time performance comparisons between SQL code running inside the database system and external data mining programs. We highlight which steps in data mining projects become faster when processed by the database system. More importantly, we identify advantages and disadvantages from a practical standpoint based on data mining users feedback.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call