Extracting Topics from Semi-structured Data for Enhancing Enterprise Knowledge Graphs

Neda Abolhassani,Lakshmish Ramaswamy

doi:10.1007/978-3-030-30146-0_8

Neda Abolhassani, Lakshmish Ramaswamy

https://doi.org/10.1007/978-3-030-30146-0_8

Copy DOI

Export

Save

Cite

Publication Date: Jan 1, 2019

Citations: 3

Affiliation: University of Georgia

Abstract
Full-Text
Similar Papers

Abstract

Listen

Unifying information across the organizational data silos that lack documentation, structure and automated semantic discovery has been of an intense interest in the recent years. Enterprise knowledge graph is a common tool of data integration and knowledge discovery and it has become a backbone to APIs that demand access to structured knowledge. A piece which was previously unnoticed in building enterprise knowledge graph, is adding an abstract layer of themes and concepts which is mapped to various documents stored as semi-structured files in databases. Augmenting enterprise knowledge graphs by concepts will help companies to find the trends in their data and get a holistic view over their entire data stores. Extracting topics from semi-structured data suffers from lack of corpus or description as its major challenge. In this research, we investigate the impact of self-supplementation of words and documents on probabilistic topic modeling upon semi-structured data. Another contribution of this paper is finding the best tuning of probabilistic topic modeling that fits semi-structured data. The extracted topics are potential summaries and concepts about the dataset. Moreover, they can be mapped to their sources of origin in order to extend the enterprise knowledge graph. We consider 2 inferencing techniques and demonstrate the results on real data pools from Open City data and Kaggle data containing 7.5 GB and 1.15 GB of data stored in MongoDB collections, respectively. We also propose a selection heuristic for effective identification of topics hidden in various data sources.

Full Text