Abstract

Due to large size words also data patterns, it is difficult to ensure the quality of relevant characteristics that are found in text documents that describe user preferences. Most widely used text mining and classification techniques now in use have embraced term-based strategies. However, polysemy and synonymy issues have affected them all. The theory that pattern-based approaches should outperform term-based ones in performance in expressing user preferences has been often held throughout the years, however text mining still struggles with how to employ large-scale patterns successfully. This research introduces a novel methodology for relevance feature discovery to address this hard problem. It finds higher level features in text texts that are both positive and negative patterns and uses them instead of low-level features (terms). Additionally, it organised terms into categories and updates term weights according to the patterns and specificity of those distributions. Significant tests employing this model on the datasets RCV1, TREC themes, and Reuters-21578 reveal that it performs noticeably better than both the most advanced term-based approaches and pattern-based methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.