Cyber–physical systems have become critical across industries. They have driven investments in education services to develop well-trained engineers. Education services for cyber–physical systems require the hiring of expert tutors with multidisciplinary knowledge, as well as acquiring expensive facilities/equipment. In response to the challenges posed by the need for the equipment and facilities, a metaverse-based education service that incorporates digital twins has been explored as a solution. However, the issue of recruiting expert tutors who can enhance students’ achievements remains unresolved, making it difficult to effectively cultivate talent. This paper proposes a reference architecture for a learner-centric educational metaverse with an intelligent tutoring framework as its core feature to address these issues. We develop a novel explainable artificial intelligence scheme for multi-class object detection models to assess learners’ achievements within the intelligent tutoring framework. Additionally, a genetic algorithm-based improvement search method is applied to the framework to derive personalized feedback. The proposed metaverse architecture and framework are evaluated through a case study on drone education. The experimental results show that the explainable AI scheme demonstrates an approximately 30% improvement in the explanation accuracy compared to existing methods. The survey results indicate that over 70% of learners significantly improved their skills based on the provided feedback.