Ontology Learning Based on Word Embeddings for Text Big Data Extraction

Nesma Mahmoud,Hatem M. Abdlkader,Heba Elbeh

doi:10.1109/icenco.2018.8636154

Abstract

Big Data term describes data that exists everywhere in humongous volumes, raw forms, and heterogenous types. Unstructured and uncategorized data forms 95% of big data. Text big data lacks to efficiently extract domain-relevant data in a suitable time. Thus, text big data stills a barrier for big data integration and subsequently big data analytics. Because big data integration can’t consider text big data in its process of preparing data for big data analytics. On the other side, ontology represents information and knowledge in a graph schema that provides a shareable, reusing and domain-specific data. Thus, ontology fits text big data needs of extracting domain relevant data. So, this paper proposes an ontology learning (OL) methodology for text big data extraction. OL aims to provides algorithms, techniques, and tools for automatic ontology construction from the text. The proposed OL method exploits a deep learning approach i.e., word embeddings, and advanced hierarchical clustering i.e., BIRCH. The utilization of the word embeddings and the advanced hierarchical clustering improve OL quality in text big data extraction and reduce the processing time. Also, deep learning unsupervisory learns from a massive amount of unlabeled and uncategorized raw data. This great big benefit solves analytical challenge of the text big data. In evaluation, precision, recall, and f – value for the work quality and the running time for performance are measured. The quality of work is evaluated by comparing its results with gold standard datasets results. Experimental results and evaluation demonstrate that the proposed OL methodology efficiently suitable for text big data extraction.

Full Text