Abstract
The process of data analytics on large-scale government administrative data — that belong to various domains like education, transport, energy, and health — can be enhanced by retrieving pertinent documents from diverse data sources. Without a supporting framework of metadata, big data analytics can be daunting. Even though statistical algorithms can perform extensive analyses on a variety of data with little help from metadata, applying these techniques to heterogeneous data may not always result in reliable findings. Recently, semantics-aware (or semantic search) search techniques received much attention as they utilize implicit knowledge to enhance the search. Similarly, traditional search engines rely on the inherent linkages within the underlying data model to improve their search quality. In the case of general-purpose information retrieval systems, to gather information from the internet (open access data) or to access open government administrative data, a domain agnostic ontology shall be employed to supply background knowledge. This paper draws on research undertaken by the authors at IIIT Bangalore Center for Open Data Research (CODR) in developing a semantics-aware data lake framework to host and analyze government administrative data. In this study, we present an ontology-based document retrieval solution where an ontology serves as an intermediary to close the gap between what the user seeks and what the search retrieves. Although our study settings are based on the Government of Karnataka (GoK, India), we believe the findings have wider resonance. Our experimental results based on agricultural data from the GoK look promising.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.