Abstract
Aiming at the problem of conceptual ambiguity and underlying semantic structure of multi label text categorization, an ensemble classification method is proposed, which combines random forest (RF) algorithm and semantic core co-occurrence latent semantic vector space (CLSVSM). Through the random segmentation of words, the diversity of integration is increased, and the different orthogonal projection of low dimensional implicit semantic space is obtained. Random forest can effectively solve binary classification problem, and implicit semantics reveals the underlying semantic structure of text. The combination of them can represent the diversity and accuracy of individuals. The experimental results on Yahoo dataset demonstrate the effectiveness of the proposed method, which is superior to other methods in Hamming loss, coverage, first error and average accuracy.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have