The performance evaluation of e-learning in medical education has been the subject of much research lately. Researchers are yet to achieve a consensus on the definition of performance or the suitable constructs, metrics, models, and methods to help understand student performance. Through a systematic review, this study put forward a working definition of what constitutes performance evaluation to reduce the ambiguity, arbitrariness, and multiplicity surrounding performance evaluation of e-learning in medical education. A systematic review of published articles on performance evaluation of e-learning in medical education was performed on the SCOPUS, Web of Science, PubMed, and EBSCOHost databases using search terms deduced from the PICOS model. Following the PRISMA guidelines relevant published papers were searched and exported to Endnote. Screening and quality appraisal were done on Rayyan. Three thousand four hundred and thirty-nine published studies were retrieved and screened using predetermined inclusion and exclusion criteria. One hundred and three studies passed all the criteria and were reviewed. The reviewed literature used 30 constructs to operationalize performance. The leading constructs are knowledge and effectiveness. Both constructs were used by 60% of the authors of the reviewed literature to define student performance. Knowledge gain, satisfaction, and learning outcome are the most common metrics used by 81%, 26%, and 15% of the reviewed literature to measure student performance. The study discovered that most researchers forget to evaluate the “e” or electronic component of e-learning when evaluating performance. The constructs operationalized and metrics measured were primarily focused on learning outcomes with minimal focus on technology-related metrics or the influence of the electronic mode of delivery on the learning process or evaluation outcome. Only 6% of the reviewed literature applied evaluation models to guide their evaluation process - mostly the Kirkpatrick evaluation model. Also, most of the included studies used randomization as an experimental control method, mainly using pre-and post-test surveys. Modern evaluation methods were rarely used. Only 1% of the reviewed literature used Google Analytics, and 2% used data from a learning management system. This study increments the existing body of knowledge in performance evaluation of e-learning in medical education by providing a convergence of constructs, metrics, models, and methods and proposing a roadmap to guide students’ performance evaluation process from the synthesis of findings and the gaps identified through the systematic review of existing literature in the domain. This roadmap will assist in informing researchers of grey areas to consider when evaluating performance to ensure more quality research outputs in the domain.