Abstract
Governments are embracing an open data philosophy and making their data freely available to the public to encourage innovation and increase transparency. However, the number of available datasets is still limited. Finding relationships between related datasets on different data portals enables users to search the relevant datasets. These datasets are generated from the training data, which need to be curated by the user query. However, relevant dataset retrieval is an expensive operation due to the preparation procedure for each dataset. Moreover, it requires a significant amount of space and time. In this study, we propose a novel framework to identify the relationships between datasets using structural information and semantic information for finding similar datasets. We propose an algorithm to generate the Concept Matrix (CM) and the Dataset Matrix (DM) from the concepts and the datasets, which is then used to curate semantically related datasets in response to the users’ submitted queries. Moreover, we employ the proposed compression, indexing, and caching algorithms in our proposed scheme to reduce the required storage and time while searching the related ranked list of the datasets. Through extensive evaluation, we conclude that the proposed scheme outperforms the existing schemes.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.