Abstract

Natural language processing contributes to improved retrieval performance by extracting from natural language text information about terms and their relationships. This information is far richer than what is obtained with term frequency methods that assume the statistical independence of terms. Once acquired, this linguistic knowledge may be reflected in retrieval and filtering systems in either a modified query or in a modified document incorporating this information. We can expect natural language processing to improve filtering and retrieval performance if, and only if, the application of linguistically derived information increases the ability of the retrieval system to discriminate between documents of differing relevance. While linguistic knowledge may be obtained through purely statistical analysis, humans may extract this same information without using massive number crunching capabilities, and it is likely that, for automated systems, linguistic methods may be ultimately simpler and faster at extracting information that improves retrieval performance than are methods that explicitly incorporate higher order statistical dependencies. We present a model of grammatical parsing and part-of-speech tagging that allows us to make specific claims about the level of retrieval and filtering performance that will be obtained when linguistic knowledge is incorporated. The model provides both upper and lower bounds for performance with the best-case and worst-case part-of-speech tagging.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.