Abstract

Text classification suffers from the high dimensionality and sparseness of the feature space. Feature selection (FS) is known as an important stage of the pre-processing phase. Most recently, Point-wise Mutual Information (PMI), a common concept in information theory, has been used as an effective and widely adapted approach for FS. The FS method proposed in this paper uses a hybrid approach to propose a new global PMI-based FS method. In it, the advantages of both the filter approach and the wrapper approach are combined in a different way. In the first phase, the ranking-based filter approach is implemented for FS by applying the information gain method. In the second phase, the subset selection-based filter approach is implemented for FS by introducing the global PMI-based FS method designed based on two basic principles: (1) class-dependent assumption for computing the correlation between pairs of features, and (2) an embedded wrapper approach. The results showed that the hybrid proposed method can produce better results than the state-of-the-art FS methods in both classification performance and dimension reduction.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call