Feature subset selection for high dimensional data with domain analysis using Semantic Mining

Abdul Majeed K. M

doi:10.9790/0661-1651108111

Abstract

Feature subset selection involves identifying a subset of the most useful features that produces compatible results as the original entire set of features. A feature selection algorithm may be evaluated from both the efficiency and effectiveness points of view. While the efficiency concerns the time required to find a subset of features, the effectiveness is related to the quality of the subset of features. Current existing algorithms for feature sub set selection works only based on conducting statistical test like Pearson test or symmetric uncertainty test to find the correlation between the features and apply threshold to filter redundant and irrelevant features. FAST proposed by Qinbao Song (9) uses symmetric uncertainty test for feature subset selection. In this work we extend the FAST algorithm by applying the domain analysis using semantic Mining to improve the relevance of the feature subset selection. Index Terms: Feature subset selection, filter method, feature clustering, graph-based clustering

Full Text