A Hybrid Ensemble Word Embedding based Classification Model for Multi-document Summarization Process on Large Multi-domain Document Sets

S Anjali Devi,S Sivakumar

doi:10.14569/ijacsa.2021.0120918

Abstract

Contextual text feature extraction and classification play a vital role in the multi-document summarization process. Natural language processing (NLP) is one of the essential text mining tools which is used to preprocess and analyze the large document sets. Most of the conventional single document feature extraction measures are independent of contextual relationships among the different contextual feature sets for the document categorization process. Also, these conventional word embedding models such as TF-ID, ITF-ID and Glove are difficult to integrate into the multi-domain feature extraction and classification process due to a high misclassification rate and large candidate sets. To address these concerns, an advanced multi-document summarization framework was developed and tested on number of large training datasets. In this work, a hybrid multi-domain glove word embedding model, multi-document clustering and classification model were implemented to improve the multi-document summarization process for multi-domain document sets. Experimental results prove that the proposed multi-document summarization approach has improved efficiency in terms of accuracy, precision, recall, F-score and run time (ms) than the existing models.

Highlights

Machine learning (ML) has become a key approach to problem solving and data predictions
Machine learning allows a classifier to learn a set of rules, or the criterion of decision, from a set of labelled data that an expert has annotated
Specialized machine learning techniques such as neural networks (NN) and support vector machines (SVM) are used in many fields to classify data into one or more classes, traditional models must be improved on large datasets with high dimensionality

Summary

Introduction

Machine learning (ML) has become a key approach to problem solving and data predictions. Most of the research was done on binary classifiers in the field of machine learning based on the classification of multi-domain document data. Approaches based on sentence extraction from documents are used in singledocument summarization. Most single-document summarization systems employ a simple method for summary generation, which consists of extracting the first sentence from each paragraph and placing them in the same order as they were written. The order of the sentences in the summary is the problem Another challenge with news summarization systems is how to handle huge feature sets, as the complexity of weight adjustment increases exponentially as the number of features increases. Classification [9] can be defined as the procedure of classifying objects of interest into different previously defined categories or classes

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Advanced Computer Science and Applications	Publication Date: Jan 1, 2021
Citations: 1	License type: cc-by

R Discovery Prime

R Discovery Prime

A Hybrid Ensemble Word Embedding based Classification Model for Multi-document Summarization Process on Large Multi-domain Document Sets

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications

Lead the way for us

Similar Papers

An empirical assessment of different word embedding and deep learning models for bug assignment
Rongcun Wang ... Rubing Huang
The Journal of Systems & Software | VOL. 210
Rongcun Wang, et. al.Rongcun Wang ... Rubing Huang
06 Jan 2024
The Journal of Systems & Software | VOL. 210

A reproducible survey on word embeddings and ontology-based methods for word similarity: Linear combinations outperform the state of the art
Juan J Lastra-Díaz ... Eneko Agirre
Engineering Applications of Artificial Intelligence | VOL. 85
Juan J Lastra-Díaz, et. al.Juan J Lastra-Díaz ... Eneko Agirre
01 Aug 2019
Engineering Applications of Artificial Intelligence | VOL. 85

Impact of word embedding models on text analytics in deep learning environment: a review.
Deepak Suresh Asudani ... Naresh Kumar Nagwani
Artificial Intelligence Review | VOL. 56
Deepak Suresh Asudani, et. al.Deepak Suresh Asudani ... Naresh Kumar Nagwani
22 Feb 2023
Artificial Intelligence Review | VOL. 56

An efficient multiple-word embedding-based cross-domain feature extraction and aspect sentiment classification
Monika Agrawal ... Nageswara Rao Moparthi
Measurement: Sensors | VOL. 28
Monika Agrawal, et. al.Monika Agrawal ... Nageswara Rao Moparthi
26 Jun 2023
Measurement: Sensors | VOL. 28

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Hybrid Ensemble Word Embedding based Classification Model for Multi-document Summarization Process on Large Multi-domain Document Sets

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications