Abstract

High dimensionality and sparsity are the primary issues in text classification. Using feature selection approaches, the most effective way to solve the problem is to select a subset of features. The most common and effective methods used for this process are filter techniques. Various performance metrics such as Micro-F1, Macro-F1, and Accuracy are used to evaluate the performance of filter methods used for feature selection on datasets  Such methods work depending on a classification algorithm. However, when selecting features in filter techniques, the information on the individual features is evaluated without considering the relationship between the features. In such an approach, the actual performance of the filter technique used in feature selection may not be determined. In such a case, it causes the existing methods to be insufficient in testing the validity of the proposed method. For this purpose, this study suggests a novel performance metric called Selection Error (SE) to determine the actual performance evaluation of filter techniques. The Selection Error metric allows us to analyze the information value of the selected features more accurately than existing methods without relying on a classifier. The feature selection performance of the filtering approaches was performed on six different datasets with both The Selection Error and traditional performance metrics. When the results are examined, it is seen that there is a strong relationship between the proposed performance metric and the classification performance metric results. The Selection Error aims to significantly contribute to the literature by demonstrating the success of filtering feature selection methods, regardless of classifier performance. 

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.