Abstract

Current research is constantly producing an enormous amount of information, which presents a challenge for data mining algorithms. Many of the problems in some of the most relevant research areas, such as bioinformatics, security and intrusion detection or text mining, involve large or huge datasets. Data mining algorithms are seriously challenged by these datasets. One of the most common methods to handle large datasets is data reduction. Among others, feature and instance selection are arguably the most commonly used methods for data reduction. Conversely, feature and instance weighting focus on improving the performance of the data mining task.Due to the different aims of these four methods, instance and feature selection and weighting, they can be combined to improve the performance of the data mining methods used. In this paper, a general framework for combining these four tasks is presented, and a comprehensive study of the usefulness of the 15 possible combinations is performed.Using a large set of 80 problems, a study of the behavior of all possible combinations in classification performance, data reduction and execution time is carried out. These factors are also studied using 60 class-imbalanced datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call