As the aging population issue worsens, people are experiencing increasing life pressures, and the elderly are unable to receive sufficient care, intensifying the demand for home-based intelligent massage healthcare robots. Addressing the challenges faced by users in expressing preferences for specific areas like the back through gestures or voice during the use of massage robots, and the issue of elderly individuals expressing imprecisely, which hinders effective communication of the desired massage target location to the robot, this paper proposes a new human-machine interaction mode for massage robots. The main innovations include: achieving precise control of the massage area through the representation of a virtual human body on the screen; proposing a virtual-to-real mapping algorithm based on human acupoints to address the consistency issue in human virtual-to-real mapping; introducing a multimodal intent understanding algorithm based on dynamic information entropy to address the current limitations of a single interaction mode and low intent comprehension rates in massage robot interactions. Experimental results demonstrate that the proposed multimodal massage localization algorithm achieves good massage intent recognition effects. By combining natural human-machine interaction with intent understanding, it not only accurately captures user massage intentions and assists in completing massage tasks but also reduces user psychological and cognitive loads, leading to more desirable interaction outcomes.