Articles published on Automatic Text Simplification
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
1201 Search results
Sort by Recency
- New
- Research Article
- 10.1108/jarhe-08-2025-0634
- Apr 21, 2026
- Journal of Applied Research in Higher Education
- Mohammad Hossein Ronaghi + 1 more
Purpose Rapid advancements in artificial intelligence (AI) technology have brought about numerous opportunities and challenges in various fields, including education and research. ChatGPT, as an AI tool, can generate complex analytical texts based on simple texts. However, the use of this tool poses various legal and ethical challenges, including authenticity, researcher participation and plagiarism. Accordingly, the present study aims to evaluate the acceptance of using ChatGPT among university faculty members. Design/methodology/approach The extended unified theory of acceptance and use of technology (UTAUT) model was employed in the survey, in which 625 university faculty members participated. Findings The research findings demonstrated that the social influence of ChatGPT among university professors has an impact on their utilization of this tool in educational and research processes. Additionally, security and trust influenced research participants in using ChatGPT. The results of this study indicate that ChatGPT has numerous capabilities in the field of education and research. Originality/value The theoretical contribution of this research is to provide a model and identify the factors influencing the acceptance of ChatGPT in the academic field. In this model, the factors of performance expectation, ease of use of technology, degree of efficiency, social influence, security, trust, infrastructure and environment were effective in the adoption of ChatGPT. However, university administrators should establish the educational infrastructure for utilizing this tool and develop implementing regulations while considering the legal challenges.
- Research Article
- 10.1016/j.neunet.2025.108465
- Apr 1, 2026
- Neural networks : the official journal of the International Neural Network Society
- Lifan Jiang + 4 more
Vidsketch: Hand-drawn sketch-driven video generation with diffusion control.
- Research Article
- 10.1080/10790268.2026.2639819
- Mar 31, 2026
- The Journal of Spinal Cord Medicine
- Alejandro García-Rudolph + 5 more
Context Effective communication of complex medical information is critical for individuals with spinal cord injury (SCI) and their families, but this need remains largely unmet. Artificial intelligence (AI), including large language models (LLMs) like GPT-4o, may help simplify clinical texts while preserving essential medical information. However, their ability to adapt content across ages, education levels, and languages without compromising accuracy has not been systematically evaluated. Findings We analyzed short excerpts (∼100 words) from de-identified Spanish-language SCI clinical reports that had historically posed communication challenges. A bilingual co-author produced human English translations, creating parallel English and Spanish corpora (10 texts per language: 4 originals and 6 audience-tailored simplifications). GPT-4o was applied for within-language simplification, not translation. Baseline readability quantifies the complexity of real-world clinical documentation directly. All original English texts had very high complexity by Flesch-Kincaid Grade Level (FKGL >18), whereas simplified texts reached FKGL 7.3 for children and 9.8–10.6 for adolescents and adults. Experts rated child-directed fidelity 2–3/5; adult-directed texts scored 4–5/5 for fidelity and appropriateness. Original Spanish texts ranged from “Difficult” to “Very Difficult” by the Fernández–Huerta (FH) index, showing greater variability than English. Readability classifications were not concordant across languages (3/10 cases, 30%); e.g. one simplified version was “Plain English” by FKGL but “Fairly Difficult” by FH, highlighting language-specific behavior. Conclusion/Clinical Relevance GPT-4o can tailor complex SCI clinical excerpts to specific audiences in English and Spanish, but child-directed versions may lose clinically relevant information. Clinician oversight remains essential for safe patient communication.
- Research Article
- 10.30574/ijsra.2026.18.3.0585
- Mar 31, 2026
- International Journal of Science and Research Archive
- Joan B Consulta
The continuous decline in Filipino children’s reading and comprehension skills has become alarming, worsened by the COVID-19 pandemic. Reports from the OECD and World Bank show that a large percentage of students in the country struggle to read simple texts. Lack of resources, poor internet access, and limited adult guidance further hindered learning. This highlights the need for innovative teaching methods that go beyond reading and answering questions and instead stimulate children’s emotions and interest. One such strategy is Sayawit—a combination of song, dance, and poetry recitation—designed to make learning more colorful, lively, and meaningful. This study aims to examine the effectiveness of Sayawit in teaching poetry and its impact on students’ reading comprehension. Using both qualitative and quantitative methods, the research assessed learning levels, barriers, and positive outcomes. Participants were Grade 4 students divided into an experimental group (using Sayawit) and a control group (traditional method). Results are expected to show that Sayawit increases enthusiasm, self-confidence, and comprehension, while also fostering collaboration, interaction, and emotional expression. Overall, Sayawit is presented as an innovative response to reading difficulties, promoting higher learning and appreciation of Filipino language and culture.
- Research Article
- 10.1186/s12889-026-27166-x
- Mar 28, 2026
- BMC public health
- Jacqueline Nkrumah
Adolescents in low-resource settings often lack access to sexual and reproductive health (SRH) materials that match their literacy levels. In Ghana, despite policy support for adolescent SRH education, implementation gaps persist due to the complexity of available texts. This paper presents a cyclical, theory-driven conceptual framework for developing adolescent-friendly SRH educational resources. It empirically assesses its application using evidence from a quasi-experimental study in Ghana. This framework was developed through an iterative process, including a baseline needs assessment, resource design, stakeholder validation, and evaluation of quasi-experimental methods. A review of existing materials and a literacy assessment identified major gaps in comprehension and vocabulary demands. Content was synthesized from six SRH themes and simplified into formats: a text-only version and a picture-enhanced version with static illustrations. Validation was conducted with educators, health officials, adolescents, and academic experts. The resources were piloted with 317 adolescents aged 11–15 over six weeks. A total of 249 participants completed the study. Compared to the control group, users of the simplified and picture-enhanced texts showed significant improvements in SRH decision-making skills (+ 11 points, p < 0.01) and print literacy scores (+ 13–14 points, p < 0.01). A 21% dropout rate, linked to stigma and sociocultural resistance, highlighted the need for greater community and parental engagement. The framework offers a stepwise, theory-informed approach to developing adolescent-friendly SRH materials. By reducing cognitive load and improving readability, it supports equitable access to SRH education.
- Research Article
- 10.38124/ijisrt/26mar1237
- Mar 25, 2026
- International Journal of Innovative Science and Research Technology
- P Leela Sesha Balaji + 4 more
The recruitment process in the current day is being subjected to immense pressure since companies are getting a surge of resumes for each available job. Listing and comparing all the resumes manually consume a lot of time and may result in unfair or prejudice results. For this purpose, the Smart Resume Filtering andTailoring System has been created. It is an artificial intelligence-based system beneficial for recruiters and job seekers alike. Job applicants can input their resume and job advertisement to obtain a match score, missing skills list, and suggestions on resume improvement. The system also provides the ability for a recruiter to input multiple resumes for a single job posting and automatically rank the applicants based on the job requirement. The system utilizes simple Natural Language Processing (NLP) techniques, text mining, and rules-based verification to determine match scores. It also offers a safe login and data storage system. The project saves time, enhances the accuracy of recruitment, and assists applicants in knowing how to enhance their resumes.
- Research Article
- 10.55041/ijsrem58203
- Mar 25, 2026
- INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT
- Mrs Nilima Gite + 3 more
Abstract - This project focuses on developing a smart and automated tool designed to analyze large volumes of comments in real time. The primary objective is to classify comments into sentiment categories such as positive, negative, or neutral, while also incorporating mechanisms to filter out spam and toxic content that may include offensive or abusive language. Unlike traditional sentiment analysis systems, this project integrates emoji mapping, where emojis are interpreted and mapped to their respective sentiment categories, resulting in a more precise and user-centric analysis of digital expressions. To enhance the interpretability of results, the system provides graphical visualizations such as pie charts, bar graphs, and word or emoji clouds. These visual tools allow users to quickly understand sentiment distribution, keyword frequency, and common emotional tones within a large dataset of comments. Moreover, the project includes YouTube integration, enabling real-time fetching and analysis of video comments directly from the platform. This functionality makes it especially relevant for content creators, marketers, and businesses who seek to monitor audience engagement and feedback continuously. The system has broad applicability across multiple domains. In social media monitoring, it helps track public opinion and trends. For product reviews, it assists businesses in evaluating customer satisfaction and areas of improvement. In educational settings, it can analyze feedback from students to improve learning experiences. Additionally, for content moderation, the tool helps platforms automatically detect and flag spam or toxic comments, creating safer digital environments. Overall, this project aims to deliver an efficient, scalable, and insightful sentiment analysis solution that goes beyond simple text classification. By combining comment analysis, spam detection, emoji sentiment mapping, and real- time visualization, it empowers users with actionable insights, ultimately improving decision-making and audience engagement Key Words: Sentiment Analysis, Comment Analysis, Spam Detection, Toxic Comment Filtering, Emoji Mapping, Data Visualization, YouTube Comments, Feedback Analysis, Social Media Monitoring
- Research Article
- 10.5430/wje.v16n1p139
- Mar 18, 2026
- World Journal of Education
- Arnantawut Tiang-Uan
This research aimed to 1) analyze the teacher training program situation problems using authentic materials on EFL teachers' reading exercise design competency and teaching self-efficacy, 2) develop a teacher training program based on the TPACK framework using authentic materials on EFL teachers' reading exercise design competency and teaching self-efficacy, and 3) evaluate a teacher training program based on the TPACK framework using authentic materials on EFL teachers' reading exercise design competency and teaching self-efficacy. The study involved 30 purposively sampled EFL Thai secondary teachers in Thonburi. Data were collected via competency tests, self-efficacy scales, rubrics, interviews, and observations, then analyzed using statistical and thematic methods. The findings revealed that prior practices relied heavily on simplified texts and Lower-Order Thinking Skills. Post-training, teachers achieved a 'Proficient' level in pedagogical competency, successfully designing structured plans using the P3C2R+GIRD model. Additionally, teaching self-efficacy increased to a "very high" level (M=4.38). However, technology integration skills remained 'Foundational,' suggesting that while the program successfully elevated pedagogy and confidence, further support is needed to achieve transformative technological application.
- Research Article
- 10.1177/00332941261436720
- Mar 17, 2026
- Psychological reports
- Yusuf Kızıltaş
By the end of primary school, every student is expected to be able to read and comprehend a simple text comfortably. A 10-year-old student's inability to comprehend what they read causes them to experience learning poverty. Learning poverty is a significant problem that many students face today. The purpose of this study is to investigate the levels of learning poverty in the context of reading proficiency among disadvantaged primary school students in rural areas. The study, developed using quantitative research methods, employed a criterion sampling method. All 606 students in the study, whose data were collected in 2025, are in the fourth grade of primary school. The students continue their education in five different rural provinces of Türkiye. The CHAID analysis revealed that primary school students in rural areas have high levels of learning poverty. According to the research findings, the strongest predictor of learning poverty among primary school students is teacher mobility in rural schools. In addition, inadequate nutrition and the absence of lunchboxes emerge as significant determinants of learning poverty. Furthermore, having a large number of siblings in the household (particularly seven or more), low levels of parental involvement in schooling, limited access to reading resources, and student absenteeism are also prominent factors associated with learning poverty. The findings indicate that learning poverty cannot be explained solely by individual academic failures or deficiencies; rather, it is strongly shaped by structural inequalities and socioeconomic disadvantages. In this regard, ensuring the continuity and stability of teachers assigned to rural areas is as critical as increasing the number of teacher appointments to these regions. Based on these results, the implementation of school-based and national nutrition programs, along with initiatives aimed at strengthening parental involvement, appears essential for effectively combating learning poverty.
- Research Article
- 10.1007/s10579-025-09879-4
- Mar 3, 2026
- Language Resources and Evaluation
- Christina Niklaus + 3 more
Millions of people worldwide face barriers in accessing and understanding complex written information due to limited literacy. Automatic text simplification (ATS) addresses this challenge by transforming complex texts into simpler, more accessible versions. However, most existing ATS research focuses on English, leaving Spanish, a language spoken by over 500 million people, underrepresented. This paper fills this gap by introducing large-scale sentence-aligned simplification resources for Spanish, developed from the Newsela and ClearSim corpora. We propose detailed guidelines for manual alignment, evaluate a wide range of automatic sentence alignment algorithms, and present the first systematic exploration of LLM-based monolingual sentence alignment in Spanish. Our analysis incorporates comprehensive quantitative and qualitative evaluation, supported by statistical significance testing, and reveals clear differences in the structural simplification patterns across corpora. In addition, we train and release baseline ATS models using the new aligned datasets, demonstrating their practical utility for downstream simplification. All alignment code, trained models, and evaluation scripts will be publicly released to ensure transparency and reproducibility. Together, these contributions substantially advance the resources and methodology for Spanish-language ATS.
- Research Article
- 10.1142/s1793351x26410035
- Mar 1, 2026
- International Journal of Semantic Computing
- Shubham S Bhatt + 1 more
We introduce a Reinforcement Learning (RL)-based framework to optimize discrete natural language prompts for enhancing both the accuracy and clarity in sentence simplification. Using a lightweight PPO policy, our method learns to guide a frozen small-scale LLaMA-3.2 3B model toward effective simplification for supporting user-centric computational thinking tasks. Results show that our RL-optimized prompts significantly surpass manual baselines in semantic fidelity, logical coherence, and instructional quality. Moreover, the proposed RL-optimized prompting approach enables a much smaller LLM to achieve results that are comparable in clarity and instructional value to those produced by a much larger LLaMA-3.3 70B model.
- Research Article
- 10.56578/ataiml050105
- Mar 1, 2026
- Acadlore Transactions on AI and Machine Learning
- Abebe Kindie Awuraris + 2 more
This paper explored how generative artificial intelligence (AI) could enhance the digital accessibility of individuals with visual, auditory, and cognitive impairments.It aims to develop an adaptive and context-sensitive system to dynamically customize content in accordance with users' needs.The proposed system creates text simplification with generative AI models like Generative Pretrained Transformer 3 (GPT-3), and caption images with Contrastive Language-Image Pre-Training (CLIP).It adapts users' reactions with reinforcement learning, to enable the generation of real-time and personalized content.This project tested the system performance with mixed data, including texts, images, and videos.The outcomes revealed that the accessibility of the content had been significantly increased.At the same time, the Flesch-Kincaid Grade Level was reduced by 50% through text simplification, and the bilingual evaluation understudy (BLEU) score was ranked at 0.74 in the case of image captioning.User satisfaction had increased by 15% after feedback corrections.In addition to these results, the system demonstrated high effectiveness in supporting auditory-impaired users by achieving a subtitle synchronization accuracy of 94.6% in video content, and increasing auditory user satisfaction by 18% during accessibility evaluations.This study helped develop AI-based accessibility and provide more inclusive online environment for people with disabilities, thus facilitating their access to online content.In conclusion, the proposed system is more convenient and could offer a broader range of individual and time-sensitive user experiences, compared to the current accessibility models.
- Research Article
- 10.2106/jbjs.25.00982
- Feb 26, 2026
- The Journal of bone and joint surgery. American volume
- Joseph E Nassar + 9 more
Orthopaedic patient education materials (PEMs) within Epic's Elsevier library often exceed the recommended sixth-grade reading level, with a mean grade of 8.6 in English and 5.8 in Spanish, risking poor patient comprehension and adherence. The present study evaluated whether artificial intelligence (AI)-based text simplification can improve readability while preserving clinical accuracy. The objectives were to use previously established readability data for English and Spanish PEMs as baselines, to assess the impact of human-based and ChatGPT-based simplification on reading grade level, and to compare the fidelity of simplified texts against standard materials. In March 2025, 806 orthopaedic PEM documents were simplified using standardized ChatGPT prompts. Readability was reassessed using validated English and Spanish formulas, and fidelity was evaluated in the 86 PEMs that also had human easy-to-read versions. Two blinded clinicians compared human and ChatGPT-4o outputs with the originals to identify hallucinations, omissions, and inconsistencies according to severity. Following the release of ChatGPT-5, an unblinded post hoc analysis was performed using identical criteria. ChatGPT-4o-simplified PEMs showed mean reading grade levels of 6.1 in English and 3.5 in Spanish. Compared with human simplifications, ChatGPT-4o showed fewer English omissions, similar Spanish omissions, fewer inconsistencies in both languages, and comparable English hallucinations, but higher Spanish hallucinations. Compared with ChatGPT-4o, ChatGPT-5 preserved English performance and improved Spanish fidelity, reducing hallucinations to human-comparable rates. AI-driven simplification can produce orthopaedic PEMs that are easier to read while maintaining acceptable fidelity. The improvements observed with ChatGPT-5 highlight its potential for clinician-supervised use in generating accessible and reliable PEMs. This study is clinically relevant because orthopaedic PEMs are routinely delivered through the Epic electronic health record and directly affect patient understanding, consent, and adherence in both English and Spanish. By evaluating the readability and fidelity of AI-simplified materials across languages, this study informs safe, scalable strategies to improve patient communication in everyday orthopaedic practice.
- Research Article
1
- 10.3148/cjdpr-2025-031
- Feb 12, 2026
- Canadian journal of dietetic practice and research : a publication of Dietitians of Canada = Revue canadienne de la pratique et de la recherche en dietetique : une publication des Dietetistes du Canada
- Angela Luo + 4 more
Purpose: This study aimed to determine the perception of hospitalized patients towards two educational malnutrition infographics. Methods: A cross-sectional intercept interview was conducted with participants with a nutrition diagnosis of "malnourished" as determined by subjective global assessment. Participants were asked to provide feedback on two different infographics on malnutrition. Results: Nineteen out of 50 (38%) participants consented to participate. Most participants (n=13/19, 68.4%) had been hospitalized for at least one week. Of the participants, 78.9% (n=15/19) found the information in both infographics useful. Fifty-eight percent (n=11/19) said a dietitian had come to see them. Twenty-six percent (n=5/19) had identified that living alone would be a barrier to eating to meet needs after leaving the hospital, while 26.3% (n=5/19) identified poor appetite as a challenge. Sixty-three percent (n=12/19) of participants reported that the words "loss of independence" on infographic 1 stood out to them the most, whereas the words "fuel up to heal up" were noted for infographic 2. Discussion: Participants related to messages that aligned with their health experiences. Participants provided practical tips such as using simpler texts and images in the posters. Future research should examine the effectiveness of infographics in motivating patients to increase their dietary intake.
- Research Article
- 10.1515/les-2026-0003
- Feb 9, 2026
- Lebende Sprachen
- Giuliana Fiorentino + 1 more
Abstract The simplification of language – particularly with regard to administrative discourse – has long been a central concern within Italian linguistics. Over the past few decades, significant progress has been made, including the development of consolidated and widely accepted lists of linguistic features – both morphosyntactic and lexical – that influence textual simplicity and accessibility (cf. Fiorentino/Ganfi 2024). These advances contributed to the early creation of a readability index, the Gulpease index , in the 1980 s (cf. Lucisano/Piemontese 1988). Within this framework, the authors have developed a software for the automatic simplification of administrative texts, supported by QWEN3 (a large language model, LLM), entitled SEMPL-IT (cf. Russodivito et al. 2024; Fiorentino/Russodivito 2025; Ganfi/Russodivito 2025; Fiorentino et al. forthcoming; Fiorentino/Russodivito forthcoming). As part of this project, a corpus named ItaIst (Fiorentino et al. 2024b) The ItaIst corpus is publicly available on Hugging Face at the following link: https://huggingface.co/datasets/VerbACxSS/ItaIst (15 July 2025). was compiled and subjected to automatic simplification using the BASIC approach , resulting in a parallel corpus of simplified texts. This simplified corpus was then compared to the source corpus and evaluated in terms of improved readability and Semantic similarity (cf. Chandrasekaran et al. 2021), with the objective of validating the effectiveness of the simplification process. In this contribution, we introduce and validate a new methodology – the CHAIN approach – applied to a different corpus, ItaRegol (Fiorentino et al. 2024a). The ItaRegol corpus is publicly available on Hugging Face at the following link: https://huggingface.co/datasets/VerbACxSS/ItaRegol (15 July 2025). Although smaller in size than ItaIst , ItaRegol comprises rules and regulations, i. e., legally binding texts that create, modify, or extinguish subjective legal positions. Due to the legal nature of these texts, simplification must be carried out with caution to avoid altering their legal effects. This paper compares the two simplification approaches – BASIC and CHAIN – by evaluating the parameters adopted, assessing the quality of the simplified output, and drawing conclusions regarding the differing impact of these strategies in enhancing the readability of administrative versus regulatory texts.
- Research Article
- 10.1002/acr.80016
- Feb 9, 2026
- Arthritis care & research
- Abimbola Fadairo-Azinge + 6 more
The American College of Rheumatology recommends HLA-B*58:01 allele testing before the initiation of allopurinol, specifically among Asian and African American/Black patients, due to their increased risk for severe hypersensitivity reactions. However, testing rates remain low at many health care facilities. This study aimed to determine whether the introduction of a clinician-facing dashboard, with or without a best practice alert (BPA), would increase HLA-B*58:01 testing rates among eligible patients at two VA medical centers. In October 2022, we launched a clinician-facing dashboard that displayed HLA-B*58:01 testing results for all patients prescribed allopurinol at two VA medical centers. In January 2023, we added a synchronous BPA that appeared at one of the two medical centers whenever a provider entered a prescription for allopurinol. The BPA was presented as blue text on the medication order entry form and stated "HLA-B*58:01 genotyping recommended before rx for Asian or African American." Using data from the electronic health record, we compared the change in proportion of patients receiving allopurinol prescriptions who had HLA-B*58:01 genotype testing done after implementation of the dashboard (±BPA) at the two VA medical centers. From October 2022 through December 2023, the number of Asian or African American/Black patients who filled one or more prescriptions for allopurinol was 262 at the BPA + dashboard site and 330 at the dashboard only site. The percentage of Asian or African American/Black patients who had HLA-B*58:01 testing before or during the month allopurinol was prescribed increased from 8.8% in October 2022 to 35.5% in December 2023 at the dashboard + BPA site and from 2.1% to 4.5% at the dashboard only site (P < 0.0001 for difference in difference). The cumulative percentage of prescribers whose patient(s) completed HLA-B*58:01 testing increased from 11.7% to 46.8% at the dashboard + BPA site and from 6.9% to 10.8% at the dashboard only site (P < 0.0001 for difference in difference). Implementing a dashboard plus simple text BPA was associated with a greater increase in the proportion of patients who completed guideline-recommend HLA-B*58:01 testing compared to implementing a dashboard alone.
- Research Article
- 10.1097/js9.0000000000004454
- Feb 3, 2026
- International journal of surgery (London, England)
- Haoyang Zeng + 9 more
To evaluate the application value of three ChatGPT versions and Gemini in pathology report simplification tasks for prostate cancer. This retrospective study assessed GPT-3.5, GPT-4.0, GPT-4o, and Gemini on pathology reports from 228 prostate cancer patients across two institutions. Data were split into internal (center 1, n =171) and external (center 2, n =57) cohorts. Using specific prompts, models generated simplified texts. The evaluation of outputs included three main dimensions: (1) human scoring by patients, clinicians, and pathologists; (2) readability scores; and (3) BERT-based semantic similarity scores. Statistical comparisons employed paired t-tests or Wilcoxon signed-rank tests. Statistical consistency between raters was assessed using squared weighted kappa, intraclass correlation coefficient(3,1), and percent agreement, with 95% confidence intervals calculated for all metrics. GPT-4o (Few-Shot) achieved the highest accuracy and comprehensiveness scores from pathologists, while Gemini demonstrated the best understandability. Patient and clinician understandability ratings were consistently high across models. Mean Reading Grade Level scores varied between internal and external datasets, with GPT-4o Few-Shot performing best overall. BERT-based semantic similarity scores demonstrated distinct trends across models, reflecting differences in text simplification strategies. LLMs adopt distinct trade-off strategies between simplifying pathology reports and preserving their structure and logic, influenced by prompt design and textual style. Their application shows potential to enhance patient comprehension and clinical communication. Future work should focus on domain-specific fine-tuning to ensure safe and reliable clinical integration.
- Research Article
- 10.22214/ijraset.2026.77156
- Jan 31, 2026
- International Journal for Research in Applied Science and Engineering Technology
- Niya L R
Dyslexia is a neurodevelopmental learning disorder that affects reading fluency, spelling accuracy, and written expression, often resulting in academic difficulties and reduced self-confidence. Conventional assistive tools provide limited support by addressing isolated learning challenges. This paper presents WordWhiz, an AI-powered assistive system designed to enhance reading comprehension and writing accuracy for individuals with dyslexia. The proposed system integrates text-to-speech with word highlighting, speech-to- text with grammar correction, phonetic spelling assistance, and transformer-based sentence simplification within a unified framework. The system is implemented as a device-based application to ensure low latency, data privacy, and offline usability. Experimental evaluation demonstrates improved text readability, reduced grammatical errors, and enhanced user engagement, validating the effectiveness of AI-driven assistive technologies in inclusive education
- Research Article
1
- 10.2196/77149
- Jan 23, 2026
- JMIR AI
- Amela Miftaroski + 3 more
BackgroundPatient education materials (PEMs) found online are often written at a complexity level too high for the average reader, which can hinder understanding and informed decision-making. Large language models (LLMs) may offer a solution by simplifying complex medical texts. To date, little is known about how well LLMs can handle simplification tasks for German-language PEMs.ObjectiveThe study aims to investigate whether LLMs can increase the readability of German online medical texts to a recommended level.MethodsA sample of 60 German texts originating from online medical resources was compiled. To improve the readability of these texts, four LLMs were selected and used for text simplification: ChatGPT-3.5, ChatGPT-4o, Microsoft Copilot, and Le Chat. Next, readability scores (Flesch reading ease [FRE] and Wiener Sachtextformel [4th Vienna Formula; WSTF]) of the original texts were computed and compared to the rephrased LLM versions. A Student t test for paired samples was used to test the reduction of readability scores, ideally to or lower than the eighth grade level.ResultsMost of the original texts were rated as difficult to quite difficult (average WSTF 11.24, SD 1.29; FRE 35.92, SD 7.64). On average, the LLMs achieved the following average scores: ChatGPT-3.5 (WSTF 9.96, SD 1.52; FRE 45.04, SD 8.62), ChatGPT-4o (WSTF 10.6, SD 1.37; FRE 39.23, SD 7.45), Microsoft Copilot (WSTF 8.99, SD 1.10; FRE 49.0, SD 6.51), and Le Chat (WSTF 11.71, SD 1.47; FRE 33.72, SD 8.58). ChatGPT-3.5, ChatGPT-40, and Microsoft Copilot showed a statistically significant improvement in readability. However, the t tests yielded no statistically significant results for the reduction of scores lower than the eighth grade level.ConclusionsLLMs can improve the readability of PEMs in German. This moderate improvement can support patients reading PEMs online. LLMs demonstrated their potential to make complex online medical text more accessible to a broader audience by increasing readability. This is the first study to evaluate this for German online medical texts.
- Research Article
- 10.51558/2303-4858.2025.13.2.181
- Jan 22, 2026
- ExELL
- Nejla Kalajdžisalihović
The paper explores a more comprehensive approach to assessing text-level difficulty by combining quantitative readability metrics with qualitative analyses of content and context which help in reading comprehension and reading-for-translation. It compares two excerpts using eight readability scores formulas (Automated Readability Index, Flesch Reading Ease, Gunning Fog Index, Flesch-Kincaid Grade Level, Coleman-Liau Readability Index, Smog Index, Original Linsear Write Formula, Linsear Write Grade Level Formula) to explore how topic, content, and context may be used as indicators of text-level difficulty. Using authentic texts, specifically interviews from Humans of New York, the paper aims to demonstrate that other (extra)linguistic features must be considered beyond the numerical scores provided by readability formulas.