Abstract

ABSTRACTIn this paper we propose a set of novel regression models for readability scoring in Bengali language, which can also be used for Hindi, making use of several lexical, surface-level, syntactic and semantic features. We perform 5-fold and leave-one-out cross-validation on a human-annotated gold standard dataset of 30 passages, written by 4 eminent Bengali litterateurs. On this dataset, our best model achieves a mean squared error (MSE) of 57%, which is better than state-of-the-art results (73% MSE). We further perform feature analysis to identify potentially useful features in learning a regression model for Bengali readability. Ablation studies indicate the importance of compound characters (Juktakkhors) in readability assessment.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call