Abstract

The integration of machine learning in educational data analysis presents challenges regarding the availability of sufficient training data, especially in the context of high missing data ratios. These challenges arise from data partitioning practices, resulting in smaller datasets and less precise models. Behavioral scientists have increasingly incorporated machine learning into propensity score estimation, necessitating investigations into the most effective training and testing partitioning methods for machine learning-based imputation. To address this gap in the literature, our Monte Carlo experiment examines the impact of partitioning methods and missing data ratios. Simulated datasets, featuring missing ratios of 10%, 30%, 50%, and 70%, are divided into training and testing sets, ranging from 80–20 to 20–80. Results indicate that each imputation method delivers highly accurate average treatment effects. However, in the context of maintaining covariate balance across diverse conditions, complex ensemble methods outperform artificial neural networks. A real-data comparison (Study II) further underscores that the adoption of sophisticated machine learning techniques significantly enhances covariate balance. This research contributes valuable insights into the development of machine learning-based imputation methods, with a specific focus on scenarios characterized by high missing data ratios, in educational data analysis.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.