Abstract

Exploring an optimal combination of features from high-dimensional text vectors is an essential step for classification on Scientific and Technical Service (STS) resources. However, existing metaheuristic feature selection methods exhibit deteriorated performance and increased computational costs when dealing with high-dimensional datasets. Based on these issues, a novel Enhanced Binary Black Hole (EBBH) algorithm is investigated to improve the performance of mining the optimal subset of high-dimensional attributes in STS resources text. In the proposed EBBH, all operators employ a novel binary encoding framework to reduce the number of features, which provides a more efficient way to search the solution space. As a pre-processing step, prime feature attention assigns different selection weights based on the variable importance ranking from random forests, ensuring a high-quality initial population is generated. Moreover, we incorporate an adaptive enhancement strategy to maintain the balance between exploitation and exploration in the search process. The proposed EBBH is assessed on eight high-dimensional benchmark datasets and subsequently demonstrated successful application to five STS resource datasets. Comparative experiments with 12 state-of-the-art algorithms for high-dimensional problems revealed the superiority of EBBH in terms of fitness, classification accuracy, and selected features.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call