Abstract

With the flourishing development of corpus linguistics and technological revolutions in the AI-powered age, automated essay scoring (AES) models have been intensively developed. However, the intricate relationship between linguistic features and different constructs of writing quality has yet to be thoroughly investigated. The present study harnessed computational analytic tools and Principal Component Analysis (PCA) to distill and refine linguistic indicators for model construction. Findings revealed that both micro-features and their combination with aggregated features robustly described writing quality over aggregated features alone. Linear and non-linear models were thus developed to explore the associations between linguistic features and different constructs of writing quality. The non-linear AES model with Random Forest Regression demonstrated superior performance over other benchmark models. Furthermore, SHapley Additive exPlanations (SHAP) was employed to pinpoint the most powerful linguistic features for each rating trait, enhancing the model’s transparency through explainable AI (XAI). These insights hold the potential to substantially facilitate the advancement of multi-dimensional approaches toward writing assessment and instruction.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.