The field of natural language processing has experienced significant advances in recent years, but these advances have not yet resulted in improved analytics for instructors on MOOC platforms. Valuable information, such as suggestions, is generated in the comment forums of these courses, but due to their volume, manual processing is often impractical. This study examines the feasibility of fine-tuning and effectively utilizing state-of-the-art deep learning models to identify comments that contain suggestions in MOOC forums. The main challenges encountered are the lack of labeled datasets from the MOOC context for fine-tuning classification models and the soaring computational cost of this training. For this study, we manually collected and labeled 2228 comments in Spanish and English from 5 MOOCs and scraped 1.4 million MOOC reviews from 3 platforms. We fine-tuned and evaluated 4 pretrained models based on the transformer architecture and 3 traditional machine learning models to compare their effectiveness in the suggestion mining task in this domain. Transformer-based models proved to be highly effective in this task/domain combination, achieving performance levels that matched or exceeded those deemed appropriate in other contexts and were significantly greater than those achieved by traditional models. Domain adaptation led to improved linguistic understanding of the target domain; however, in this project, this approach did not translate into an observable improvement in suggestion mining. The automated identification of comments that can be labeled as suggestions can result in considerable time savings for instructors, especially considering that less than a quarter of the analyzed comments contain suggestions.
Read full abstract