Abstract

Discourse coherence is strongly associated with text quality, making it important to natural language generation and understanding. However, existing coherence models focus on measuring individual aspects of coherence, such as lexical overlap, entity centralization, rhetorical structure, etc., lacking measurement of the semantics of text. In this paper, we propose a discourse coherence analysis method combining sentence embedding and the dimension grid, we obtain sentence‐level vector representation by deep learning, and we introduce a coherence model that captures the fine‐grained semantic transitions in text. Our work is based on the hypothesis that each dimension in the embedding vector is exactly assigned a stated certainty and specific semantic. We take every dimension as an equal grid and compute its transition probabilities. The document feature vector is also enriched to model the coherence. Finally, the experimental results demonstrate that our method achieves excellent performance on two coherence‐related tasks.

Highlights

  • IntroductionIn a well-written text, sentences are structured to convey the author’s purpose, and it is ensured that each new piece is interpretable given the preceding context, which is so-called “coherence.” Modelling text coherence has been an important issue along with multidocument summarization (MDS) and retrieval-based question answering (QA), and currently, it has intersected with essay scoring [1, 2] and text autogeneration [3, 4]

  • A well-written text is the rigorous flow of logic

  • In a well-written text, sentences are structured to convey the author’s purpose, and it is ensured that each new piece is interpretable given the preceding context, which is so-called “coherence.” Modelling text coherence has been an important issue along with multidocument summarization (MDS) and retrieval-based question answering (QA), and currently, it has intersected with essay scoring [1, 2] and text autogeneration [3, 4]

Read more

Summary

Introduction

In a well-written text, sentences are structured to convey the author’s purpose, and it is ensured that each new piece is interpretable given the preceding context, which is so-called “coherence.” Modelling text coherence has been an important issue along with multidocument summarization (MDS) and retrieval-based question answering (QA), and currently, it has intersected with essay scoring [1, 2] and text autogeneration [3, 4]. Given a text composed of sentences, we aim to evaluate its rationality and determine the coherence quality. Previous work on modelling coherence quality has mostly used two approaches: one is dimensionality reduction techniques such as latent semantic analysis (LSA), and the other is entity grid and its extension models based on Markov theory. Some number values, especially negatives, in the generated matrix lack a human interpretation, and the polysemy and the word order cannot be handled properly. erefore, the entity grid model [5], which is well interpretable and has a strong theoretical base, has been the method that researchers have gradually adopted. e entity grid is a statistical model based on centralization theory [6], and it models the text by focusing on the transition of syntactic roles of important entities and computing the text probability by repeating the transition process

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.