Abstract
TEXPROS (TEXT PROcessing System) is an intelligent document processing, system; it supports storing, extracting, classifying, categorizing, retrieving, and browsing information from a variety of office documents [76]. This article presents a retrieval subsystem for TEXPROS, which is capable of processing incomplete, imprecise, and vague queries, and providing semantically meaningful responses to the user. The design of the retrieval subsystem is highly integrated with various mechanisms for achieving these goals. First, a system catalog including a thesaurus is used to store the knowledge about the database. Second, there is a query transformation mechanism composed of context construction and algebraic query formulation modules. Given an incomplete or imprecise query, the context construction module searches the system for the required terms and constructs a query that has a complete and precise representation: The resulting query is then formulated into an algebraic expression. Third, in practice, the user may not have a clear idea of what he is searching for. A browing mechanism is employed for such situations to assist the user in the retrieval process. With the browser, vague queries can be entered into the system until sufficient information, is obtained to the extent that the user is able to construct a query for his request. Finally, when processing of queries fails by responding with a null answer to the user, a generalizer mechanism is used to give the user cooperative explanation for the null answer. The presented techniques will contribute to our research toward development of highly intelligent data processing facilities beyond the present scope of database technology.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.