Recent advances in computer vision have opened the door for scalable eye tracking using only a webcam. Such solutions are particularly useful for online educational technologies, in which a goal is to respond adaptively to students' ongoing experiences. We used WebGazer, a webcam-based eye-tracker, to automatically detect covert cognitive states during an online reading-comprehension task related to task-unrelated thought and comprehension. We present data from two studies using different populations: (1) a relatively homogenous sample of university students (N = 105), and (2) a more diverse sample from Prolific (N = 173, with < 20% White participants). Across both studies, the webcam-based eye-tracker provided sufficiently accurate and precise gaze measurements to predict both task-unrelated thought and reading comprehension from a single calibration. We also present initial evidence of predictive validity, including a positive correlation between predicted rates of task-unrelated thought and comprehension scores. Finally, we present slicing analyses to determine how performance changed under certain conditions (lighting, glasses, etc.) and generalizability of the results across the two datasets (e.g., training on the data Study 1 and testing on data from Study 2, and vice versa). We conclude by discussing results in the context of remote research and learning technologies.