Abstract

Autonomous and Distributed repositories containing digital documents are maintained and managed independently in accordance to organization's business needs. Documents containing same information in different repositories maybe represented differently, making it hard to retrieve desired information. The information explosion necessitates efficient techniques to unearth the lump of information from hay stack of online digital documents with same and heterogeneous structures. Keyword based information retrieval techniques help in improving the recall of user query result, but has a low precision. To improve precision, we adopt semantic information retrieval technique from digital documents using ontology and maintain dynamic and evolving domain ontology to accommodate the retrieved information. We followed searching technique using thematic similarity approach to enhance the precision of search results. We propose a comprehensive architecture for semantic based information retrieval and search. Plain text is read semantically and the extracted metadata is stored for later use to answer user queries. Triple-centric technique is used for maintaining source metadata (in case of system crash) and probing user queries for capturing the context of the keywords. Semantic based information retrieval and annotation technique precision and recall results are very promising. Semantic search using thematic similarity approach proves to have better precision and recall than previous keyword based searching techniques.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call