Abstract
Due to the ever increasing number of documents in the digital form, automated text clustering has become a promising method for the text analysis in last few decades. A major issue in the text clustering is high dimensionality of the feature space. Most of these features are irrelevant, redundant, and noisy that mislead the underlying algorithm. Therefore, feature selection is an essential step in the text clustering to reduce dimensionality of the feature space and to improve accuracy of the underlying clustering algorithm. In this paper, a hybrid intelligent algorithm, which combines the binary particle swarm optimization (BPSO) with opposition-based learning, chaotic map, fitness based dynamic inertia weight, and mutation, is proposed to solve feature selection problem in the text clustering. Here, fitness based dynamic inertia weight is integrated with the BPSO to control movement of the particles based on their current status, and the mutation and the chaotic strategy are applied to enhance the global search capability of the algorithm. Moreover, an opposition-based initialization is used to start with a set of promising and well-diversified solutions to achieve a better final solution. In addition, the opposition-based learning method is also used to generate opposite position of the gbest particle to get rid of the stagnation in the swarm. To prove effectiveness of the proposed method, experimental analysis is conducted on three different benchmark text datasets Reuters-21578, Classic4, and WebKB. The experimental results demonstrate that the proposed method selects more informative features set compared to the competitive methods as it attains higher clustering accuracy. Moreover, it also improves convergence speed of the BPSO.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.