This paper proposes a novel teleoperation method that allows users to guide robots hand by hand along with speech. In this method, the virtual robot modeled according to the remote real robot is projected into the real local environment to form a 3D operation interface. In this case, users can directly interact with virtual objects by their hands. Furthermore, since the Leap Motion is attached to the augmented reality (AR) glasses, the operation space is greatly extended. Therefore, users can observe the virtual robot from an arbitrary angle without blind angle in such a mobile pattern, which enhances the users' interactive immersion and provides more natural human-machine interaction. To improve the accuracy of the measurement, an unscented Kalman filter (UKF) and an improved particle filter (IPF) are used to estimate the position and orientation of the hand, respectively. Furthermore, Term Frequency-Inverse Document Frequency (TF-IDF) and maximum entropy model are adopted to recognize the speech and gestures instructions of the user. The proposed method is compared with the three human-machine methods on various experiments. The results verified the effectiveness of the proposed method.