On Revealing Shared Conceptualization Among Open Datasets

Miloš Bogdanović,Natasa Veljkovic,Milena Frtunic Gligorijevic,Darko Puflovic,Leonid Stoimenov

doi:10.2139/ssrn.3770603

Abstract

Openness and transparency initiatives are not only milestones of science progress but have also influenced various fields of organization and industry. Under this influence, varieties of government institutions worldwide have published a large number of datasets through open data portals. Government data covers diverse subjects and the scale of available data is growing every year. Published data is expected to be both accessible and discoverable. For these purposes, portals take advantage of metadata accompanying datasets. However, a part of metadata is often missing which decreases users' ability to obtain the desired information. As the scale of published datasets grows, this problem increases. An approach we describe in this paper is focused towards decreasing this problem by implementing knowledge structures and algorithms capable of proposing the best match for the category where an uncategorized dataset should belong to. By doing so, our aim is twofold: enrich datasets metadata by suggesting an appropriate category and increase its visibility and discoverability. Our approach relies on information regarding open datasets provided by users-dataset description contained within dataset tags. Since dataset tags express low consistency due to their origin, in this paper we will present a method of optimizing their usage through means of semantic similarity measures based on natural language processing mechanisms. Optimization is performed in terms of reducing the number of distinct tag values used for dataset description. Once optimized, dataset tags are used to reveal shared conceptualization originating from their usage by means of Formal Concept Analysis. We will demonstrate the advantage of our proposal by comparing concept lattices generated using Formal Concept Analysis before and after the optimization process and use generated structure as a knowledge base to categorize uncategorized open datasets. Finally, we will present a categorization mechanism based on the generated knowledge base that takes advantage of semantic similarity measures to propose a category suitable for an uncategorized dataset.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

On Revealing Shared Conceptualization Among Open Datasets

Abstract

Talk to us

Similar Papers

More From: SSRN Electronic Journal

Lead the way for us

Similar Papers

On revealing shared conceptualization among open datasets
Miloš Bogdanović ... Leonid Stoimenov
Journal of Web Semantics | VOL. 66
Miloš Bogdanović, et. al.Miloš Bogdanović ... Leonid Stoimenov
16 Dec 2020
Journal of Web Semantics | VOL. 66

GENERATING KNOWLEDGE STRUCTURES FROM OPEN DATASETS' TAGS - AN APPROACH BASED ON FORMAL CONCEPT ANALYSIS
Miloš Bogdanović ... Leonid Stoimenov
Facta Universitatis, Series: Automatic Control and Robotics | VOL. 20
Miloš Bogdanović, et. al.Miloš Bogdanović ... Leonid Stoimenov
14 Apr 2021
Facta Universitatis, Series: Automatic Control and Robotics | VOL. 20

Open Data Categorization Based on Formal Concept Analysis
Milena Frtunic Gligorijevic ... Natasa Veljkovic
IEEE Transactions on Emerging Topics in Computing | VOL. 9
Milena Frtunic Gligorijevic, et. al.Milena Frtunic Gligorijevic ... Natasa Veljkovic
01 Apr 2021
IEEE Transactions on Emerging Topics in Computing | VOL. 9

Cross-portal metadata alignment – Connecting open data portals through means of formal concept analysis
Miloš Bogdanović ... Leonid Stoimenov
Information Sciences | VOL. 637
Miloš Bogdanović, et. al.Miloš Bogdanović ... Leonid Stoimenov
15 Apr 2023
Information Sciences | VOL. 637

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

On Revealing Shared Conceptualization Among Open Datasets

Abstract

Talk to us

Similar Papers

More From: SSRN Electronic Journal