Abstract
Classification learning from large-scale data is a pivotal and complex task in machine learning. Enhancing classifier performance can be achieved through effective feature selection and construction. However, current feature selection methods often encounter difficulties with large-scale data. Considering granulation is essential for developing accurate and efficient models, we firstly introduce a novel granulation technique, termed consistent granulation in this study. Furthermore, an innovative feature construction method leveraging consistent granulation is proposed to enhance computational speed, scalability, and classification performance of feature selection. Consequently, both time and space complexities are reduced to linear levels relative to sample sizes. Consistent granulation also allows for an accurate representation of the data's topological structure and reduces the parameter set from the interval [0,1] to a finite set, simplifying the search for optimal parameters. Experimental results indicate that our proposed algorithm, FCG, surpasses four classical feature selection algorithms—FARNeMF, HANDI, GBNRS, and RMDPS—in handling datasets with millions of samples and tens of thousands of dimensions. Specifically, on the first fourteen datasets, FCG decreases the average runtime by times of 1029.30 (FARNeMF), 703.54 (HANDI), 1253.78 (GBNRS), and 3.08 (RMDPS), while enhancing average classification accuracy by 3.24% (FARNeMF), 2.74% (HANDI), 5.77% (GBNRS), and 15.5% (RMDPS).
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have