An Ameliorated Methodology for Feature Subset Selection on High Dimensional Data using Precise Relevance Measures

Asha T,Kaveri B.V

doi:10.5120/ijca2015906344

Abstract

subset selection refers to the method of choosing the set of attributes that best describes the dataset. The attributes obtained from the attribute subset selection method when applied to machine learning operations such as clustering, classification etc., should provide the same result as that of the original dataset. The method employed for attribute subset selection must be efficient in terms of selecting the relevant attributes and must also be accurate in terms of eliminating the redundant attributes. With the aim of satisfying the above two goals we have designed a feature subset selection method using the precise relevance measures. We first efficiently select the relevant attributes using the relevance measure symmetric (SU). The selected relevant attributes are, then divided into clusters based on graph-theoretic clustering method using the relevance measure conditional mutual information (CMI). Then the relevance measure symmetric uncertainty is used to select the attributes that are strongly related to the target class and also which best represents each cluster, thus giving us an accurate and independent subset of features. The above developed method not only produces smaller more accurate subset of features but also improves the performance of the machine learning operations such as naive base classifier

Full Text