Abstract

AbstractMassive open online courses (MOOCs) are recent and widely studied distance learning approaches aimed at providing learning material to learners from geographically dispersed locations without age, gender, or race‐related constraints. MOOCs generally enriched by discussion forums to provide interactions among students, professors, and teaching assistants. MOOC discussion forum posts provide feedback regarding the students' learning processes, social interactions, and concerns. The purpose of our research is to present a document‐clustering model on MOOC discussion forum posts based on weighted word embeddings and clustering to identify question topics on discussion posts. In this study, four word‐embedding schemes (namely, word2vec, fastText, global vectors, and Doc2vec), four weighting functions (i.e., term frequency‐inverse document frequency [IDF], IDF, smoothed IDF, and subsampling function), and four clustering algorithms (i.e., K‐means, K‐means++, self‐organizing maps, and divisive analysis clustering algorithm) for document clustering and topic modeling on MOOC discussion forum posts have been evaluated. Twenty different feature representations obtained from word‐embedding schemes and weighting functions have been obtained. The feature representation schemes have been evaluated in conjunction with four clustering methods. For the evaluation task, the empirical results for the latent Dirichlet allocation have been also included. The empirical results in terms of adjusted rand index, normalized mutual information, and adjusted mutual information indicate that weighted word‐embedding schemes combined with clustering algorithms outperform the conventional schemes.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.