In rehabilitation settings that exploit Mixed Reality, a clinician risks losing empathy with the patient by being immersed in different worlds, either real and/or virtual. While the patient perceives the rehabilitation stimuli in a mixed real–virtual world, the physician is only immersed in the real part. While in rehabilitation, this may cause the impossibility for the clinician to intervene, in skill assessment, this may cause difficulty in evaluation. To overcome the above limitation, we propose an innovative Augmented Reality (AR) framework for rehabilitation and skill assessment in clinical settings. Data acquired by a distributed sensor network are used to feed a “shared AR” environment so that both therapists and end-users can effectively operate/perceive it, taking into account the specific interface requirements for each user category: (1) for patients, simplicity, immersiveness, engagement and focus on the task; (2) for clinicians/therapists, contextualization and natural interaction with the whole set of data that is linked with the users’ performances in real-time. This framework has a strong potential in Occupational Therapy (OT) but also in physical, psychological, and neurological rehabilitation. Hybrid real and virtual environments may be quickly developed and personalized to match end users’ abilities and emotional and physiological states and evaluate nearly all relevant performances, thus augmenting the clinical eye of the therapist and the clinician-patient empathy. In this paper, we describe a practical exploitation of the proposed framework in OT: setting-up the table for eating. Both a therapist and a user wear Microsoft HoloLens 2. First, the therapist sets up the table with virtual furniture. Next, the user places the corresponding real objects (also in shape) to match them as closely as possible to the corresponding virtual ones. The therapist’s view is augmented during the test with motion, balance, and physiological estimated cues. Once the training is completed, he automatically perceives deviations in the position and attitude of each object and the elapsed time. We used a camera-based localization algorithm achieving a level of accuracy of 5 mm with a confidence level of 95% for position and 1° for rotation. The framework was designed and tested in collaboration with clinical experts of Villa Rosa rehabilitation hospital in Pergine (Italy), involving both a set of patients and healthy users to demonstrate the effectiveness of the designed architecture and the significance of the analyzed parameters between healthy users and patients.