Abstract

Contextual text feature extraction and classification play a vital role in the multi-document summarization process. Natural language processing (NLP) is one of the essential text mining tools which is used to preprocess and analyze the large document sets. Most of the conventional single document feature extraction measures are independent of contextual relationships among the different contextual feature sets for the document categorization process. Also, these conventional word embedding models such as TF-ID, ITF-ID and Glove are difficult to integrate into the multi-domain feature extraction and classification process due to a high misclassification rate and large candidate sets. To address these concerns, an advanced multi-document summarization framework was developed and tested on number of large training datasets. In this work, a hybrid multi-domain glove word embedding model, multi-document clustering and classification model were implemented to improve the multi-document summarization process for multi-domain document sets. Experimental results prove that the proposed multi-document summarization approach has improved efficiency in terms of accuracy, precision, recall, F-score and run time (ms) than the existing models.

Highlights

  • Machine learning (ML) has become a key approach to problem solving and data predictions

  • Machine learning allows a classifier to learn a set of rules, or the criterion of decision, from a set of labelled data that an expert has annotated

  • Specialized machine learning techniques such as neural networks (NN) and support vector machines (SVM) are used in many fields to classify data into one or more classes, traditional models must be improved on large datasets with high dimensionality

Read more

Summary

Introduction

Machine learning (ML) has become a key approach to problem solving and data predictions. Most of the research was done on binary classifiers in the field of machine learning based on the classification of multi-domain document data. Approaches based on sentence extraction from documents are used in singledocument summarization. Most single-document summarization systems employ a simple method for summary generation, which consists of extracting the first sentence from each paragraph and placing them in the same order as they were written. The order of the sentences in the summary is the problem Another challenge with news summarization systems is how to handle huge feature sets, as the complexity of weight adjustment increases exponentially as the number of features increases. Classification [9] can be defined as the procedure of classifying objects of interest into different previously defined categories or classes

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call