The high dimensionality of large datasets can severely impact the data mining process. Therefore, feature selection becomes an essential preprocessing stage, aimed at reducing the dimensionality of the dataset by selecting the most informative features while improving classification accuracy. This paper proposes a novel binary Gray Wolf Optimization algorithm to address the feature selection problem in classification tasks. Firstly, the historical optimal position of the search agent helps explore more promising areas. Therefore, by linearly combining the best positions of the search agents, the algorithm's exploration capability is increased, thus enhancing its global development ability. Secondly, the novel quadratic interpolation technique, which integrates population diversity with local exploitation, helps improve both the diversity of the population and the convergence accuracy. Thirdly, chaotic perturbations (small random fluctuations) applied to the convergence factor during the exploration phase further help avoid premature convergence and promote exploration of the search space. Finally, a novel transfer function processes feature information differently at various stages, enabling the algorithm to search and optimize effectively in the binary space, thereby selecting the optimal feature subset. The proposed method employs a k-nearest neighbor classifier and evaluates performance through 10-fold cross-validation across 32 datasets. Experimental results, compared with other advanced algorithms, demonstrate the effectiveness of the proposed algorithm.