Abstract
We analyzed natural language document retrieval queries from the Thomas Cooper Library at the University of South Carolina in order to investigate the frequency of various types of ill-formed input, such as spelling errors, co-occurrence violations, conjunctions, ellipsis and missing or incorrect punctuation. The primary reason for analyzing ill-formed inputs was to determine whether there is a significant need to study ill-formed inputs in detail. After analyzing the queries, we found that most of the queries were sentence fragments and that many of them contained some type of ill-formed input. Conjunctions caused the most problems. The next most serious problem was caused by punctuation errors. Spelling errors occurred in a small number of the queries. The remaining types of ill-formed input considered, ellipsis and co-occurrence violations, were not found in the queries.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.