Recent advances in extended reality (XR) technology have opened the possibility of significantly improving telemedicine systems. This is primarily achieved by transferring 3D information about patient state, which is utilized to create more immersive experiences on VR/AR headsets. In this paper, we propose an XR-based telemedicine collaboration system in which the patient is represented as a 3D avatar in an XR space shared by local and remote clinicians. The proposed system consists of an AR client application running on Microsoft HoloLens 2 used by a local clinician, a VR client application running on the HTC vive Pro used by a remote clinician, and a backend part running on the server. The patient is captured by a camera on the AR side, and the 3D body pose estimation is performed on frames from this camera stream to form a 3D patient avatar. Additionally, the AR and VR sides can interact with the patient avatar via virtual hands, and annotations can be performed on a 3D model. The main contribution of our work is the use of 3D body pose estimation for the creation of a 3D patient avatar. In this way, 3D body reconstruction using depth cameras is avoided, which reduces system complexity and hardware and network resources. Another contribution is the novel architecture of the proposed system, where audio and video streaming are realized using WebRTC protocol. The performance evaluation showed that the proposed system ensures high frame rates for both AR and VR client applications, while the processing latency remains at an acceptable level.