Text Coherence Analysis based on Misspelling Oblivious Word Embeddings and Deep Neural Network

Md. Anwar Hussen Wadud,Md. Rashadul

doi:10.14569/ijacsa.2021.0120124

Md. Anwar Hussen Wadud, Md. Rashadul

Open Access

https://doi.org/10.14569/ijacsa.2021.0120124

Copy DOI

Abstract

Text coherence analysis is the most challenging task in Natural Language Processing (NLP) than other subfields of NLP, such as text generation, translation, or text summarization. There are many text coherence methods in NLP, most of them are graph-based or entity-based text coherence methods for short text documents. However, for long text documents, the existing methods perform low accuracy results which is the biggest challenge in text coherence analysis in both English and Bengali. This is because existing methods do not consider misspelled words in a sentence and cannot accurately assess text coherence. In this paper, a text coherence analysis method has been proposed based on the Misspelling Oblivious Word Embedding Model (MOEM) and deep neural network. The MOEM model replaces all misspelled words with the correct words and captures the interaction between different sentences by calculating their matches using word embedding. Then, the deep neural network architecture is used to train and test the model. This study examines two different types of datasets, one in Bengali and the other in English, to analyze text consistency based on sentence sequence activities and to evaluate the effectiveness of this model. In the Bengali language dataset, 7121 Bengali text documents have been used where 5696 (80%) documents have been used for training and 1425 (20%) documents for testing. And in the English language dataset, 6000 (80%) documents have been used for training and 1500 (20%) documents for model evaluation out of 7500 text documents. The efficiency of the proposed model is compared with existing text coherence analysis techniques. Experimental results show that the proposed model significantly improves automatic text coherence detection with 98.1% accuracy in English and 89.67% accuracy in Bengali. Finally, comparisons with other existing text coherence models of the proposed model are shown for both English and Bengali datasets.

Highlights

Text coherence analysis is a very well-known key term in natural language processing for a text with multiple sentences [1]
If we identify misspelling sentences and determine word vectors for correct words from a misspelled word, it is a new dimension for coherence analysis
1) Model inputs: Since this study considers words out of vocabulary, misspelled words, etc. the input of the proposed coherence model will be the output of the misspelling word embedding model which are word vectors of different types of words

Summary

Introduction

Text coherence analysis is a very well-known key term in natural language processing for a text with multiple sentences [1]. With the rapid development of digital communication mediums such as social networks, mobile devices, or online news portals it is more complex to identify which information is consistent or inconsistent. It is very difficult to check the consistency of text among sentences with sort time without automatic evaluation. During digital communication or online assessment or reporting news sometimes a naive user may misspell some word or couple of words in their whole text [2]. Common errors such as grammatical mistakes, vocabulary, or syntax errors can be determined, but finding text coherence between paragraphs is very difficult both in the manual and computerized systems.

Objectives

Methods

Findings

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Advanced Computer Science and Applications	Publication Date: Jan 1, 2021
Citations: 11	License type: cc-by

R Discovery Prime

R Discovery Prime

Text Coherence Analysis based on Misspelling Oblivious Word Embeddings and Deep Neural Network

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications

Lead the way for us

Similar Papers

Word Embeddings for Natural Language Processing

-

01 Jan 2015
01 Jan 2015

Learning Relevant Models using Symbolic Regression for Automatic Text Summarization
Eder Vazquez Vazquez ... Yulia Ledeneva
Computación y Sistemas | VOL. 23
Eder Vazquez Vazquez, et. al.Eder Vazquez Vazquez ... Yulia Ledeneva
30 Mar 2019
Computación y Sistemas | VOL. 23

Word Embedding for Bengali Language using Domain-related Corpus
Ashutosh Bandyopadhyay ... Jayashree Nair
-
Ashutosh Bandyopadhyay, et. al.Ashutosh Bandyopadhyay ... Jayashree Nair
26 Apr 2023
26 Apr 2023

Document similarity estimation for sentiment analysis using neural network
Hidekazu Yanagimoto ... Akane Yoshimura
-
Hidekazu Yanagimoto, et. al.Hidekazu Yanagimoto ... Akane Yoshimura
01 Jun 2013
01 Jun 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Text Coherence Analysis based on Misspelling Oblivious Word Embeddings and Deep Neural Network

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications