Abstract

Aiming at the problem of conceptual ambiguity and underlying semantic structure of multi label text categorization, an ensemble classification method is proposed, which combines random forest (RF) algorithm and semantic core co-occurrence latent semantic vector space (CLSVSM). Through the random segmentation of words, the diversity of integration is increased, and the different orthogonal projection of low dimensional implicit semantic space is obtained. Random forest can effectively solve binary classification problem, and implicit semantics reveals the underlying semantic structure of text. The combination of them can represent the diversity and accuracy of individuals. The experimental results on Yahoo dataset demonstrate the effectiveness of the proposed method, which is superior to other methods in Hamming loss, coverage, first error and average accuracy.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.