Advances in computational language models increasingly enable adaptive support for self‐regulated learning (SRL) in digital learning environments (DLEs; eg, via automated feedback). However, the accuracy of those models is a common concern for educational stakeholders (eg, policymakers, researchers, teachers and learners themselves). We compared the accuracy of four Dutch language models (ie, spaCy medium, spaCy large, FastText and ConceptNet NumberBatch) in the context of secondary school students' learning of causal relations from expository texts, scaffolded by causal diagram completion. Since machine learning relies on human‐labelled data for the best results, we used a dataset with 10,193 students' causal diagram answers, compiled over a decade of research using a diagram completion intervention to enhance students' monitoring of their text comprehension. The language models were used in combination with four popular machine learning classifiers (ie, logistic regression, random forests, support vector machine and neural networks) to evaluate their performance on automatically scoring students' causal diagrams in terms of the correctness of events and their sequence (ie, the causal structure). Five performance metrics were studied, namely accuracy, precision, recall, F1 and the area under the curve of the receiver operating characteristic (ROC‐AUC). The spaCy medium model combined with the neural network classifier achieved the best performance for the correctness of causal events in four of the five metrics, while the ConceptNet NumberBatch model worked best for the correctness of the causal sequence. These evaluation results provide a criterion for model adoption to adaptively support SRL of causal relations in DLEs. Practitioner notesWhat is already known about this topic Accurate monitoring is a prerequisite for effective self‐regulation. Students struggle to accurately monitor their comprehension of causal relations in texts. Completing causal diagrams improves students' monitoring accuracy, but there is room for further improvement. Automatic scoring could be used to provide adaptive support during diagramming. What this paper adds Comparison of four Dutch word vector models combined with four machine learning classifiers for the automatic scoring of students' causal diagrams. Five performance metrics to evaluate the above solutions. Evaluation of the word vector models for estimating the semantic similarity between student and model answers. Implications for practice and/or policy High‐quality word vector models could (em)power adaptive support during causal diagramming via automatic scoring. The evaluated solutions can be embedded in digital learning environments (DLEs). Criteria for model adoption to adaptively support SRL of causal relations in DLEs. The increased saliency of (in)correct answers via automatic scoring might help to improve students' monitoring accuracy.