Abstract
Feature selection has been widely discussed as an important preprocessing step in data mining applications since it reduces a model's complexity. In this paper, limitations of several representative reduction methods are analyzed firstly, and then by distinguishing consistent objects form inconsistent objects, decision inclusion degree and its probability distribution function as a new measure are presented for both inconsistent and consistent simplified decision systems. New definitions of distribution reduct and maximum distribution reduct for simplified decision systems are proposed. Many important propositions, properties, and conclusions for reduct are drawn. By using radix sorting and hash techniques, a heuristic distribution reduct algorithm for feature selection is constructed. Finally, compared with other feature selection algorithms on six UCI datasets, the proposed approach is effective and suitable for both consistent and inconsistent decision systems.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have