Abstract

Research on text classification has typically focused on keyword searches and statistical techniques. Keywords alone cannot always distinguish the relevant from the irrelevant texts and some relevant texts do not contain any reliable keywords at all. Our approach to text classification uses case-based reasoning to represent natural language contexts that can be used to classify texts with extremely high precision. The case base of natural language contexts is acquired automatically during sentence analysis using a training corpus of texts and their correct relevancy classifications. A text is represented as a set of cases and we classify a text as relevant if any of its cases are deemed to be relevant. We rely on the statistical properties of the case base to determine whether similar cases are highly correlated with relevance for the domain. Preliminary experiments suggest that case-based text classification can achieve very high levels of precision and outperforms our previous algorithms based on relevancy signatures.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.