Abstract

Searching for relevant documents in large sets of documents is one of the key tasks in the areas of semantic web and knowledge technologies. This paper deals with analysis and design of improvement for information retrieval (IR) using specific conceptual model automatically created from semantically non-annotated set of text documents. This conceptual model combines locally applied Formal Concept Analysis (FCA) and agglomerative clustering of particular models into one structure, which is suitable to support information retrieval process and can be combined with standard full-text search. Formal Concept Analysis (FCA) is one of the approaches which can be applied in process of conceptual modeling in domain of text documents. Extension of classic FCA (binary table data) is one-sided fuzzy version that works with real values in the object-attribute table (document-term matrix in case of vector representation of text documents). In our approach, starting set of documents is decomposed to smaller sets of similar documents with the use of some partitional clustering algorithm. Then one concept lattice is built for every cluster using FCA method and these FCA-based models are combined to hierarchy of concept lattices using agglomerative clustering algorithm. Finally, we define basic details and methods of IR system that combines standard full-text search and conceptual search (using extracted concept hierarchy).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.