Performance analysis of machine learning classifiers on improved concept vector space models

Zenun Kastrati,Ali Shariq Imran

doi:10.1016/j.future.2019.02.006

Abstract

This paper provides a comprehensive performance analysis of parametric and non-parametric machine learning classifiers including a deep feed-forward multi-layer perceptron (MLP) network on two variants of improved Concept Vector Space (iCVS) model. In the first variant, a weighting scheme enhanced with the notion of concept importance is used to assess weight of ontology concepts. Concept importance shows how important a concept is in an ontology and it is automatically computed by converting the ontology into a graph and then applying one of the Markov based algorithms. In the second variant of iCVS, concepts provided by the ontology and their semantically related terms are used to construct concept vectors in order to represent the document into a semantic vector space.We conducted various experiments using a variety of machine learning classifiers for three different models of document representation. The first model is a baseline concept vector space (CVS) model that relies on an exact/partial match technique to represent a document into a vector space. The second and third model is an iCVS model that employs an enhanced concept weighting scheme for assessing weights of concepts (variant 1), and the acquisition of terms that are semantically related to concepts of the ontology for semantic document representation (variant 2), respectively. Additionally, a comparison between seven different classifiers is performed for all three models using precision, recall, and F1 score. Results for multiple configurations of deep learning architecture are obtained by varying the number of hidden layers and nodes in each layer, and are compared to those obtained with conventional classifiers. The obtained results show that the classification performance is highly dependent upon the choice of a classifier, and that the Random Forest, Gradient Boosting, and Multilayer Perceptron are among the classifiers that performed rather well for all three models.

Highlights

The global Internet population has reached 3.8 billion in 2017 from 3.4 billion the year before, which is 47% of the world’s population [1]
This is achieved using a vector space document representations learned by deep learning and convolutional neural networks with a test accuracy of 85%. Another example of using convolutional recurrent deep learning model for classification is proposed in [7]. This approach is similar to our work but our focus is on classification of documents instead of sentences and we use feature vectors constructed by concepts derived by an ontology
It gives a description of the dataset used to perform the experiments for demonstrating the applicability of our proposed document representation models

Summary

Introduction

The global Internet population has reached 3.8 billion in 2017 from 3.4 billion the year before, which is 47% of the world’s population [1]. Despite the computational resources available nowadays, organizing and structuring tremendous amount of data is not a trivial task and without it, finding and extracting useful information. Two major limitations of this approach are: (1) it relies on the exact technique in which a document is represented into vector space using concept vectors built by mapping terms occurring in a document with concepts appearing in a ontology, and (2) weighting technique that treats all concepts important regardless of where the concepts are depicted in the hierarchy of an ontology [13]. 2. concept vectors used to represent the document into a semantic vector space are constructed by using concepts provided by the ontology through exact technique and by acquiring terms that are related and can be attached to concepts of that ontology.

Related work

Architecture of the proposed model

Text analysis module - TAM

Preprocessing

Concept extraction

Domain ontology

Weighting scheme

Document representation

Document classification

Results and analysis

Concept importance calculation

Performance evaluation of baseline CVS and iCVS

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Future generations computer systems : FGCS	Publication Date: Feb 18, 2019
Citations: 8	License type: cc-by

R Discovery Prime

R Discovery Prime

Performance analysis of machine learning classifiers on improved concept vector space models

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Future generations computer systems : FGCS

Lead the way for us

Similar Papers

An Improved Concept Vector Space Model for Ontology Based Classification
Zenun Kastrati ... Ali Shariq Imran
-
Zenun Kastrati, et. al.Zenun Kastrati ... Ali Shariq Imran
01 Nov 2015
01 Nov 2015

A New Approach to Email Classification Using Concept Vector Space Model
Chao Zeng ... Juzhong Gu
-
Chao Zeng, et. al.Chao Zeng ... Juzhong Gu
01 Dec 2008
01 Dec 2008

A CONCEPT VECTOR SPACE MODEL FOR SEMANTIC KERNELS
Sujeevan Aseervatham
International Journal on Artificial Intelligence Tools | VOL. 18
Sujeevan AseervathamSujeevan Aseervatham
01 Apr 2009
International Journal on Artificial Intelligence Tools | VOL. 18

Adaptive Concept Vector Space Representation Using Markov Chain Model
Zenun Kastrati ... Ali Shariq Imran
-
Zenun Kastrati, et. al.Zenun Kastrati ... Ali Shariq Imran
01 Jan 2014
01 Jan 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Performance analysis of machine learning classifiers on improved concept vector space models

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Future generations computer systems : FGCS