AbstractTechnology‐based, open‐ended learning environments (OELEs) can capture detailed information of students' interactions as they work through a task or solve a problem embedded in the environment. This information, in the form of log data, has the potential to provide important insights about the practices adopted by students for scientific inquiry and problem solving. How to parse and analyse the log data to reveal evidence of multifaceted constructs like inquiry and problem solving holds the key to making interactive learning environments useful for assessing students' higher‐order competencies. In this paper, we present a systematic review of studies that used log data generated in OELEs to describe, model and assess scientific inquiry and problem solving. We identify and analyse 70 conference proceedings and journal papers published between 2012 and 2021. Our results reveal large variations in OELE and task characteristics, approaches used to extract features from log data and interpretation models used to link features to target constructs. While the educational data mining and learning analytics communities have made progress in leveraging log data to model inquiry and problem solving, multiple barriers still exist to hamper the production of representative, reproducible and generalizable results. Based on the trends identified, we lay out a set of recommendations pertaining to key aspects of the workflow that we believe will help the field develop more systematic approaches to designing and using OELEs for studying how students engage in inquiry and problem‐solving practices. Practitioner notesWhat is already known about this topic Research has shown that technology‐based, open‐ended learning environments (OELEs) that collect users' interaction data are potentially useful tools for engaging students in practice‐based STEM learning. More work is needed to identify generalizable principles of how to design OELE tasks to support student learning and how to analyse the log data to assess student performance. What this paper adds We identified multiple barriers to the production of sufficiently generalizable and robust results to inform practice, with respect to: (1) the design characteristics of the OELE‐based tasks, (2) the target competencies measured, (3) the approaches and techniques used to extract features from log files and (4) the models used to link features to the competencies. Based on this analysis, we can provide a series of specific recommendations to inform future research and facilitate the generalizability and interpretability of results: Making the data available in open‐access repositories, similar to the PISA tasks, for easy access and sharing. Defining target practices more precisely to better align task design with target practices and to facilitate between‐study comparisons. More systematic evaluation of OELE and task designs to improve the psychometric properties of OELE‐based measurement tasks and analysis processes. Focusing more on internal and external validation of both feature generation processes and statistical models, for example with data from different samples or by systematically varying the analysis methods. Implications for practice and/or policy Using the framework of evidence‐centered assessment design, we have identified relevant criteria for organizing and evaluating the diverse body of empirical studies on the topic and that policy makers and practitioners can use for their own further examinations. This paper identifies promising research and development areas on the measurement and assessment of higher‐order constructs with process data from OELE‐based tasks that government agencies and foundations can support. Researchers, technologists and assessment designers might find useful the insights and recommendations for how OELEs can enhance science assessment through thoughtful integration of learning theories, task design and data mining techniques.