Chinese Automatic Text Simplification Based on Unsupervised Learning

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

In this paper, a Chinese automatic text simplification(ATS) method based on unsupervised learning was introduced. Automatic text simplification is a research field of natural language processing. In terms of Chinese texts, the reliance on the hand-made simplified corpus or dictionary is not applicable due to a large number of texts. Chinese is a diverse language, and numerous factors need to be taken into consideration. An automatic simplification method based on Chinese text and a readability formula based on linear regression was proposed in this paper. Based on our method, just input a set of Chinese sentences and the more comprehensible sentences can be obtained through syntactic simplification and lexical simplification. Through the automatic evaluation of the hand-made simplified corpus, the readability score of our system increased by 3.68 compared with that of the original text, and the SARI score reached 36.02.

Similar Papers
  • Research Article
  • Cite Count Icon 106
  • 10.1145/2738046
Making It Simplext
  • May 11, 2015
  • ACM Transactions on Accessible Computing
  • Horacio Saggion + 5 more

The way in which a text is written can be a barrier for many people. Automatic text simplification is a natural language processing technology that, when mature, could be used to produce texts that are adapted to the specific needs of particular users. Most research in the area of automatic text simplification has dealt with the English language. In this article, we present results from the Simplext project, which is dedicated to automatic text simplification for Spanish. We present a modular system with dedicated procedures for syntactic and lexical simplification that are grounded on the analysis of a corpus manually simplified for people with special needs. We carried out an automatic evaluation of the system’s output, taking into account the interaction between three different modules dedicated to different simplification aspects. One evaluation is based on readability metrics for Spanish and shows that the system is able to reduce the lexical and syntactic complexity of the texts. We also show, by means of a human evaluation, that sentence meaning is preserved in most cases. Our results, even if our work represents the first automatic text simplification system for Spanish that addresses different linguistic aspects, are comparable to the state of the art in English Automatic Text Simplification.

  • Research Article
  • Cite Count Icon 5
  • 10.1109/access.2022.3174846
Pattern-Based Syntactic Simplification of Compound and Complex Sentences
  • Jan 1, 2022
  • IEEE Access
  • Archana Praveen Kumar + 4 more

With the advent of new technologies, simplifying text automatically has been very popular and of high importance among natural language researchers during the last decade. The predominant research done in the area of Automatic Sentence Simplification(ASS) is inclined to either lexical or syntactical simplification of sentences. From the literature survey, it is observed that existing research in lexical simplification makes use of word substitution technique. This causes word sense ambiguity in cases where the word synonyms are not appropriate for a sentence in the given context. In contrast, syntactical simplification though accurate and applicable to Natural Language Processing (NLP) tasks, requires tremendous efforts to construct rules for a given domain. The research proposes a framework called Pattern-based Automatic Syntactic Simplification(PASS) which identifies sentences and applies rules based on grammatical patterns to simplify the sentences thereby making it more generic for NLP tasks. PASS is evaluated by human experts to rate the usefulness of the framework based on fluency, adequacy and simplicity of the sentences. Furthermore, the framework is automatically evaluated with the available online corpus using automatic metrics of SARI, BLEU, and FKGL. The proposed approach generates promising results in the field of ASS and could be used as a preliminary module for NLP tasks as well as other natural language-related applications like summarization, anaphora resolution, question-answering, and many more.

  • Conference Article
  • Cite Count Icon 44
  • 10.1145/3313831.3376563
Automatic Text Simplification Tools for Deaf and Hard of Hearing Adults: Benefits of Lexical Simplification and Providing Users with Autonomy
  • Apr 21, 2020
  • Oliver Alonzo + 3 more

Automatic Text Simplification (ATS), which replaces text with simpler equivalents, is rapidly improving. While some research has examined ATS reading-assistance tools, little has examined preferences of adults who are deaf or hard-of-hearing (DHH), and none empirically evaluated lexical simplification technology (replacement of individual words) with these users. Prior research has revealed that U.S. DHH adults have lower reading literacy on average than their hearing peers, with unique characteristics to their literacy profile. We investigate whether DHH adults perceive a benefit from lexical simplification applied automatically or when users are provided with greater autonomy, with on-demand control and visibility as to which words are replaced. Formative interviews guided the design of an experimental study, in which DHH participants read English texts in their original form and with lexical simplification applied automatically or on-demand. Participants indicated that they perceived a benefit form lexical simplification, and they preferred a system with on-demand simplification.

  • Book Chapter
  • Cite Count Icon 4
  • 10.1093/oxfordhb/9780199573691.013.52
Text Simplification
  • Feb 5, 2018
  • Horacio Saggion

Over the past decades, information has been made available to a broad audience thanks to the availability of texts on the Web. However, understanding the wealth of information contained in texts can pose difficulties for a number of people including those with poor literacy, cognitive or linguistic impairment, or those with limited knowledge of the language of the text. Text simplification was initially conceived as a technology to simplify sentences so that they would be easier to process by natural-language processing components such as parsers. However, nowadays automatic text simplification is conceived as a technology to transform a text into an equivalent which is easier to read and to understand by a target user. Text simplification concerns both the modification of the vocabulary of the text (lexical simplification) and the modification of the structure of the sentences (syntactic simplification). In this chapter, after briefly introducing the topic of text readability, we give an overview of past and recent methods to address these two problems. We also describe simplification applications and full systems also outline language resources and evaluation approaches.

  • PDF Download Icon
  • Conference Article
  • Cite Count Icon 1
  • 10.5121/csit.2022.121518
GRASS: A Syntactic Text Simplification System based on Semantic Representations
  • Sep 17, 2022
  • Rita Hijazi + 2 more

Automatic Text Simplification (ATS) is the process of reducing a text's linguistic complexity to improve its understandability and readability while maintaining its original information, content, and meaning. Several text transformation operations can be performed such as splitting a sentence into several shorter sentences, substitution of complex elements, and reorganization. It has been shown that the implementation of these operations essentially at a syntactic level causes several problems that could be solved by using semantic representations. In this paper, we present GRASS (GRAph-based Semantic representation for syntactic Simplification), a rulebased automatic syntactic simplification system that uses semantic representations. The system allows the syntactic transformation of complex constructions, such as subordination clauses, appositive clauses, coordination clauses, and passive forms into simpler sentences. It is based on graph-based meaning representation of the text expressed in DMRS (Dependency Minimal Recursion Semantics) notation and it uses rewriting rules. The experimental results obtained on a reference corpus and according to specific metrics outperform the results obtained by other state of the art systems on the same reference corpus.

  • Book Chapter
  • 10.1007/978-3-031-02166-4_9
Conclusion
  • Jan 1, 2017
  • Synthesis lectures on human language technologies
  • Horacio Saggion

In recent years, automatic text simplification has attracted the attention of researchers in natural language processing. Research is improving steadily. It is a difficult task for human editors to produce a text that will match the reading abilities of a target population. Therefore, it is an even more difficult task for machines, which are, for the time being, deprived of the necessary linguistic and world knowledge. However, by addressing such an important societal challenge, researchers have created new methods and repurposed old ones. In this book, we have partially covered three relevant simplification topics: text readability, lexical simplification, and syntactic simplification.

  • Research Article
  • Cite Count Icon 6
  • 10.3934/mbe.2020202
Topic-based automatic summarization algorithm for Chinese short text.
  • Jan 1, 2020
  • Mathematical Biosciences and Engineering
  • Tinghuai Ma + 4 more

Most current automatic summarization methods are for English texts. The distinction between words in Chinese text is large, the types of parts of speech are many and complex, and polysemy or ambiguous words appear frequently. Therefore, compared with English text, Chinese text is more difficult to extract useful feature words. Due to the complex syntax of Chinese, there are currently relatively few automatic summarization methods for Chinese text. In the past, only the important sentences in the original text can be selected and simply arranged to obtain a summary with chaotic sentences and insufficient coherence. Meanwhile, because Chinese short text usually contains more redundant information and the sentence structure is not neat, we propose a topic-based automatic summary method for Chinese short text. Firstly, a key sentence selection method is proposed combining topic words and TF-IDF to obtain the score of each text corresponding to the topic in the original text data. Then the sentence with the highest score as the topic sentence of the topic is selected. Considering that the short text of Weibo may contain a lot of irrelevant information and sometimes even lack some important components of topic, three retouching mechanisms are proposed to improve the conciseness, richness and readability of topic sentence extraction results. We validate our approach on natural disaster and social hot event datasets from Sina Weibo. The experimental results show that the polished topic summary not only reflects the exact relationship between topic sentences and natural disasters or social hot events, but also has rich semantic information. More importantly, we can almost grasp the basic elements of natural disaster or social hot event from the topic sentence, so as to help the government guide disaster relief or meet the needs of users for quickly obtaining information of social hot events.

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/icais53314.2022.9743107
Intelligent Input and Analysis System of English Characters from the Perspective of Semantic Recognition
  • Feb 23, 2022
  • Ling Li + 1 more

This paper absorbs some basic ideas of Lexical Functional Grammar (LFG) and knowledge of semantic recognition, and presents an expectation-driven intelligent analysis system for English Chinese character input based on grammatical and semantic analysis. The language model BERT in the field of natural language processing proposes a method for hiding Chinese and English text information based on input errors. Since the features of Chinese and English texts are quite different, and the embedding method of this article is based on words, the task of Chinese word segmentation is also involved in Chinese text. Therefore, this article proposes information embedding methods and blind extraction methods for Chinese text and English text respectively. Method, this method can replace some characters in plain text with generated input errors to embed information.

  • Research Article
  • Cite Count Icon 1
  • 10.1215/23290048-9965684
Languages, Scripts, and Chinese Texts in East Asia
  • Nov 1, 2022
  • Journal of Chinese Literature and Culture
  • Minghui Hu

Languages, Scripts, and Chinese Texts in East Asia

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/bip56202.2022.10032482
Towards Text Simplification in Spanish: A Brief Overview of Deep Learning Approaches for Text Simplification
  • Nov 15, 2022
  • Mario Romero + 5 more

Text simplification refers to the transformation of a specific source text into a target text aiming to increase understanding and readability for one or more specific audiences. This task demands large human efforts and specialized knowledge, which makes the usage of automated or semi-automated computational approaches appealing. The rise of deep learning as an unifying paradigm between seemingly different fields as image analysis, sound processing and natural language processing has considerably influenced the current state of the art approaches for automatic text simplification. Therefore, in this work, we focus on the study of deep learning based state of the art methods for automatic text simplification in the Spanish language. For this end, we first disentangle the different tasks which can be addressed in order to yield a simplified text in general. Later we review the latest deep learning-based approaches, along with the main datasets and performance metrics used in the field. We also describe approaches to deal with small datasets and technical words. Finally, we describe some lessons to build accurate automatic text simplification systems in Spanish, as in this language there is a noticeable shortage of work for text simplification.

  • Research Article
  • Cite Count Icon 10
  • 10.22099/jtls.2017.26325.2324
The Effect of Reducing Lexical and Syntactic Complexity of Texts on Reading Comprehension
  • Oct 1, 2017
  • Journal of Teaching Language Skills
  • Mahmood Safari + 1 more

The present study investigated the effect of different types of text simplification (i.e., reducing the lexical and syntactic complexity of texts) on reading comprehension of English as a Foreign Language learners (EFL). Sixty female intermediate EFL learners from three intact classes in Tabarestan Language Institute in Tehran participated in the study. The intact classes were assigned to three experimental groups. Moreover, to homogenize the groups, the researchers administered a general proficiency test (TOEFL, 2003) to the participants. The results revealed no significant difference among the groups in general proficiency and reading ability. Then four reading comprehension texts from TOEFL test (2005) were simplified through lexical simplification, syntactic simplification or lexical-syntactic simplification techniques. The simplified texts, along with their reading comprehension (RC) questions, formed the three versions of the post-test, each version contained either lexically, syntactically or lexical-syntactically simplified texts. Each group took one version of the post-test. The scores were analyzed through one-way ANOVA. The results revealed a significant difference among the groups. The post hoc test indicated that the lexical-syntactic simplification group significantly outperformed the lexical simplification group and performed considerably better than the syntactic simplification group. There was no significant difference between the lexical and syntactic simplification groups, although the latter showed better results.

  • Conference Article
  • Cite Count Icon 44
  • 10.3115/v1/w14-1206
Syntactic Sentence Simplification for French
  • Jan 1, 2014
  • Laetitia Brouwers + 3 more

This paper presents a method for the syntactic simplification of French texts. Syntactic simplification aims at making texts easier to understand by simplifying complex syntactic structures that hinder reading. Our approach is based on the study of two parallel corpora (encyclopaedia articles and tales). It aims to identify the linguistic phenomena involved in the manual simplification of French texts and organise them within a typology. We then propose a syntactic simplification system that relies on this typology to generate simplified sentences. The module starts by generating all possible variants before selecting the best subset. The evaluation shows that about 80% of the simplified sentences produced by our system are accurate.

  • Research Article
  • 10.54097/q3ej3159
A Cross-Cultural Study of Discourse Coherence: Unique Coherence Mechanisms in Chinese Literary Texts and Their Cultural Motivations
  • Jul 29, 2025
  • Journal of Education and Educational Research
  • Hefan Tang

In the context of deepening globalization, the importance of cross-linguistic and cross-cultural communication has become increasingly prominent. As a key criterion for both language comprehension and translation quality, discourse coherence faces theoretical and practical challenges posed by cultural differences. Most current coherence theories are based on English corpora and fail to account for the implicit coherence mechanisms widely found in Chinese literature. Drawing on Kehler’s theory of coherence, Asher and Lascarides’ Segmented Discourse Representation Theory (SDRT), and other relevant models, this study analyzes Chinese and English literary texts, including Honglou Meng and Pride and Prejudice, to explore differing strategies for constructing coherence. Through comparative analysis of ellipsis, imagistic linkage, and topic shift in Chinese texts, the study highlights the limitations of English-based models in explaining coherence in Chinese discourse. It argues that Chinese coherence relies heavily on shared knowledge and metaphorical structures rooted in high-context culture, whereas English discourse coherence depends on explicit logical relations and grammatical connectives. The study proposes a strategy of “cultural coherence transformation” for translation and advocates for the incorporation of cultural pragmatics into coherence theory to build a more cross-culturally applicable analytical framework.

  • Research Article
  • Cite Count Icon 37
  • 10.1093/jamia/ocac149
A survey of automated methods for biomedical text simplification.
  • Sep 9, 2022
  • Journal of the American Medical Informatics Association
  • Brian Ondov + 2 more

Plain language in medicine has long been advocated as a way to improve patient understanding and engagement. As the field of Natural Language Processing has progressed, increasingly sophisticated methods have been explored for the automatic simplification of existing biomedical text for consumers. We survey the literature in this area with the goals of characterizing approaches and applications, summarizing existing resources, and identifying remaining challenges. We search English language literature using lists of synonyms for both the task (eg, "text simplification") and the domain (eg, "biomedical"), and searching for all pairs of these synonyms using Google Scholar, Semantic Scholar, PubMed, ACL Anthology, and DBLP. We expand search terms based on results and further include any pertinent papers not in the search results but cited by those that are. We find 45 papers that we deem relevant to the automatic simplification of biomedical text, with data spanning 7 natural languages. Of these (nonexclusively), 32 describe tools or methods, 13 present data sets or resources, and 9 describe impacts on human comprehension. Of the tools or methods, 22 are chiefly procedural and 10 are chiefly neural. Though neural methods hold promise for this task, scarcity of parallel data has led to continued development of procedural methods. Various low-resource mitigations have been proposed to advance neural methods, including paragraph-level and unsupervised models and augmentation of neural models with procedural elements drawing from knowledge bases. However, high-quality parallel data will likely be crucial for developing fully automated biomedical text simplification.

  • Video Transcripts
  • 10.48448/19gd-3934
Portuguese Neural Text Simplification using Machine Translation
  • Nov 16, 2021
  • Underline Science Inc.
  • Rafael Mello + 5 more

Automatic Text Simplification (ATS) has played a significant role in the Natural Language Processing (NLP) field. ATS is a sequence-to-sequence problem aiming to create a new version of the original text removing complex and domain-specific words. It can improve communication and understanding of documents from specific domains, as well as support second language learning. This paper presents an empirical study on the use of state-of-the-art ATS methods to simplify texts in Portuguese. It is important to remark that the literature reports the challenge in analyzing Portuguese texts due to the lack of resources compared to other languages (i.e., English). More specifically, this work evaluated different Neural Machine Translation (NMT) techniques for ATS in Portuguese. The experiments showed that NMT achieved promising results in Portuguese texts, obtaining 40.89 BLEU score using multiple parallel corpora and raising the overall readability score by more than 5 points.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant