Abstract

Text simplification is crucial in bridging the comprehension gap in today’s information-rich environment. Despite advancements in English text simplification, languages with intricate grammatical structures, such as Greek, often remain under-explored. The complexity of Greek grammar, characterized by its flexible syntactic ordering, presents unique challenges that hinder comprehension for native speakers, learners, tourists, and international students. This paper introduces a comprehensive dataset for Greek text simplification, containing over 7500 sentences across diverse topics such as history, science, and culture, tailored to address these challenges. We outline the methodology for compiling this dataset, including a collection of texts from Greek Wikipedia, their annotation with simplified versions, and the establishment of robust evaluation metrics. Additionally, the paper details the implementation of quality control measures and the application of machine learning techniques to analyze text complexity. Our experimental results demonstrate the dataset’s initial effectiveness and potential in reducing linguistic barriers and enhancing communication, with initial machine learning models showing promising directions for future improvements in classifying text complexity. The development of this dataset marks a significant step toward improving accessibility and comprehension for a broad audience of Greek speakers and learners, fostering a more inclusive society.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.