Nowadays, feature based 3D reconstruction and tracking technology have been widely used in the medical field. Feature matching is the most important step in feature-based 3D reconstruction process, as the accuracy of feature matching directly affects the accuracy of subsequent 3D point cloud coordinates. However, the matching performance of traditional feature matching methods is poor. To overcome this limitation, a method of matching based on convolutional neural network is presented. The convolutional neural network is trained by collecting a training set on the video sequence of a certain length from starting frame. The matched feature points in different endoscopic video frames are treated as the same category. The feature points in subsequent frames are matched by network classification. The proposed method is validated using the silicone simulation heart video and the endoscope video of the vivo beating heart obtained by Da Vinci’s surgical robot. Compared with SURF and ORB algorithms, as well as other methods, the experimental results show that the feature matching algorithm based on convolutional neural network is effective in the feature matching effect, rotation invariance, and scale invariance. For the first 200 frames of the video, the matching accuracy reached 90%.