Abstract

Data mining has attained marvelous triumph in almost every domain such as health care, wireless sensor network, social network etc with development of its various algorithms. Every data mining algorithm has its inherent limitations. The application domain and the actual data, both together, heavily influence the particular choice as well as performance of any data mining, machine learning or statistical algorithm. The contribution that this paper makes is that it elaborates a number of data mining issues along with the metrics to measure the data quality and algorithm performance under a single hood. This paper has elaborated seven most vital issues in data mining, i.e., Missing Value Imputation, Feature Selection, Outlier Detection, Cluster Analysis of high dimensional data, Imbalanced classes in classification, Privacy of data, Mining from complex/distributed data. It not only presents these issues but also discusses their existing solutions. Survey also throws light on the limitations and research gaps for prospective researchers. These issues are identified after an extensive study on 50 different papers. Selection of these papers is carefully done so as to investigate core issues in data mining that still need to be addressed. This work has also grouped thirty data quality and algorithm performance metrics used in literature into three categories. This comprehensive understanding of the issues and metrics can be a treat for beginners in the research of data mining. Survey shows that the most frequently used algorithm performance measures are accuracy and time complexity.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call