Semantically Enhanced and Minimally Supervised Models for Ontology Construction, Text Classification, and Document Recommendation

Wael Alkhatib

doi:10.25534/tuprints-00011890

Abstract

The proliferation of deliverable knowledge on the web, along with the rapidly increasing number of accessible research publications, make researchers, students, and educators overwhelmed. Linked data platforms like SciGraph reduce this information overload by combining data from heterogeneous information sources and link them to ontologies that describe how these resources are related. Linked data platforms provide functionalities to improve the accessibility and discoverability of these resources. These functionalities include methods for maintaining and updating the ontologies used, for the assignment of concepts to resources as well as for providing recommendations of relevant resources. About 80% of information sources on the Internet originate in form of unstructured content. This triggers the need for automated methods that leverage the wealth of information embedded in unstructured content to realize the needed functionalities. This thesis provides contributions concerning three building blocks of the construction of linked data platforms from unstructured information sources, namely ontology construction and enrichment, text classification, and document recommendation. The majority of ML methods used for studying these problems are characterized by the intensive reliance on complicated feature engineering, which is a tedious, time consuming, and domain-specific process. Our work is motivated by the potential of using lexical-semantic resources and deep learning to address the research challenges in the current approaches. On the one side, existing lexical-semantic resources encode various types of information about words such as their meaning and semantic relations. On the other side, deep learning methods have achieved state-of-the-art performance on challenging NLP problems, i.e., text classification and semantic relation extraction. The rise of distributed representations is the key to the breakthrough of deep learning on various NLP tasks. The focus of this work is to develop, implement, and evaluate new approaches that better leverage the semantic similarities and regularities between words in large text corpora to minimize the hand-crafted feature engineering in current approaches. With regard to ontology construction and enrichment, we present Onto.KOM: a minimally supervised ontology learning system that uses unstructured text as input in addition to existing lexical databases. We study the effectiveness of using our approach for semantic relation classification regarding different influencing aspects, namely the input representation, the deep network structure used, and the types of semantic relations. In the scope of multi-label text classification, our contributions lie under three main areas: First, we propose an approach for feature selection using the typed dependencies between words as a measure to select the most essential features. We compare our approach with multiple statistical and semantic-based techniques, to investigate the advantage of leveraging the semantic and syntactic relationships between words to improve the quality of selected features. Second, we analyse the performance of deep learning structures on a small dataset of long documents where traditional techniques tend to perform better. Besides, we develop a new model that uses the distributed representations of document fragments and deep learning structures. We compare the new model with a wide range of feature selection and text classification techniques. Third, we address the label imbalance problem and the lack of sufficient training samples. In this scope, we develop a training-less classifier based on lexical-semantic resources as a base for classification. We transform the classification problem into graph matching problem. Concerning the recommendation of relevant resources, we address the problem of citation recommendation as a particular use case of document recommendation. We propose two models for combining the different heterogeneous information sources, such as the content of papers, co-authorship information, and previously cited papers to provide personalized citation recommendation.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Semantically Enhanced and Minimally Supervised Models for Ontology Construction, Text Classification, and Document Recommendation

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Automatic extraction and visualization of semantic relations between medical entities from medicine instructions
Maofu Liu ... Huijun Hu
Multimedia Tools and Applications | VOL. 76
Maofu Liu, et. al.Maofu Liu ... Huijun Hu
01 Dec 2015
Multimedia Tools and Applications | VOL. 76

Extracting Semantic Concepts and Relations from Scientific Publications by Using Deep Learning
Fatima N Al-Aswadi ... Keng Hoon Gan
-
Fatima N Al-Aswadi, et. al.Fatima N Al-Aswadi ... Keng Hoon Gan
01 Jan 2020
01 Jan 2020

Multi-label Classification for Clinical Text with Feature-level Attention
Disheng Pan ... Li Yang
-
Disheng Pan, et. al.Disheng Pan ... Li Yang
01 May 2020
01 May 2020

Deep active learning for multi label text classification
Qunbo Wang ... Haobin Shi
Scientific Reports | VOL. 14
Qunbo Wang, et. al.Qunbo Wang ... Haobin Shi
15 Nov 2024
Scientific Reports | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Semantically Enhanced and Minimally Supervised Models for Ontology Construction, Text Classification, and Document Recommendation

Abstract

Talk to us

Similar Papers