Abstract

Feature selection is an important preprocessing operation in the fields of machine learning and data mining. Information theory is widely used in feature selection methods because it can measure linear and nonlinear correlations among variables. Traditional information theory-based feature selection methods intend to maximize feature relevancy while minimizing feature redundancy. However, previous feature selection methods focus on either the effect of candidate features or the effect of already-selected features on the feature relevancy. In fact, both candidate features and already-selected features offer important classification information in the design of feature relevancy term. To avoid this problem, we extract useful classification information from joint mutual information to design a novel feature relevancy term named Conditional-Weight Joint Relevance (CWJR). Based on CWJR, we propose a novel feature selection method named Feature Selection considering Conditional-Weight Joint Relevance (CWJR-FS). Additionally, to distinguish the differences between our method and previous methods, we divide information theory-based feature selection methods into two categories: linear-based feature selection methods and nonlinear-based feature selection methods. Finally, our method is compared to seven linear-based methods and four nonlinear-based methods on 19 benchmark data sets. The experimental results demonstrate that CWJR-FS outperforms the compared methods in terms of the average classification accuracy, AUC and F1 score.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.