Data preprocessing in predictive data mining

Stamatios-Aggelos N Alexandropoulos,Michael N Vrahatis,Sotiris B Kotsiantis

doi:10.1017/s026988891800036x

Stamatios-Aggelos N Alexandropoulos, Michael N Vrahatis + Show 1 more

Open Access

https://doi.org/10.1017/s026988891800036x

Copy DOI

Abstract

AbstractA large variety of issues influence the success of data mining on a given problem. Two primary and important issues are the representation and the quality of the dataset. Specifically, if much redundant and unrelated or noisy and unreliable information is presented, then knowledge discovery becomes a very difficult problem. It is well-known that data preparation steps require significant processing time in machine learning tasks. It would be very helpful and quite useful if there were various preprocessing algorithms with the same reliable and effective performance across all datasets, but this is impossible. To this end, we present the most well-known and widely used up-to-date algorithms for each step of data preprocessing in the framework of predictive data mining.

Full Text