Lumbar disc herniation is a common degenerative lumbar disease with an increasing incidence. Percutaneous endoscopic lumbar discectomy can treat lumbar disc herniation safely and effectively with a minimally invasive procedure. However, the learning curve of this technology is steep, which means that initial learners are often not sufficiently proficient in endoscopic operations, which can easily lead to iatrogenic damage. At present, the application of computer deep learning technology to clinical diagnosis, treatment, and surgical navigation has achieved satisfactory results. The objective of our team is to develop a multi-element identification system for the visual field of endoscopic spine surgery using deep learning algorithms and to evaluate the feasibility of this system. We established an image database by collecting surgical videos of 48 patients diagnosed with lumbar disc herniation, which was labeled by two spinal surgeons. We selected 6000 images of the visual field of percutaneous endoscopic spine surgery (including various tissue structures and surgical instruments), divided into the training data, validation data, and test data according to 2:1:2. We developed convolutional neural network models based on instance segmentation-Solov2, CondInst, Mask R-CNN and Yolact, and set the four network model backbone as ResNet101 and ResNet50 respectively. Mean average precision (mAP) and frames per second (FPS) were used to measure the performance of each model for classification, localization and recognition in real time, and AP (average) is used to evaluate how easily an element is detected by neural networks based on computer deep learning. Comprehensively comparing mAP and FSP of each model for bounding box test and segmentation task for the test set of images, we found that Solov2 (ResNet101) (mAP = 73.5%, FPS = 28.9), Mask R-CNN (ResNet101) (mAP = 72.8%, FPS = 28.5) models are the most stable, with higher precision and faster image processing speed. Combining the average precision of the elements in the bounding box test and segmentation tasks in each network, the AP(average) was highest for tool 3 (bbox-0.85, segm-0.89) and lowest for tool 5 (bbox-0.63, segm-0.72) in the instrumentation, whereas in the anatomical tissue elements, the fibrosus annulus (bbox-0.68, segm-0.69) and ligamentum flavum (bbox-0.65, segm-0.62) had higher AP(average),while extra-dural fat (bbox-0.42, segm-0.44) was lowest. Our team has developed a multi-element identification system for the visual field of percutaneous endoscopic spine surgery adapted to the interlaminar and foraminal approaches, which can identify and track anatomical tissue (nerve, ligamentum flavum, nucleus pulposus, etc.) and surgical instruments (endoscopic forceps, an high-speed diamond burr, etc.), which can be used in the future as a virtual educational tool or applied to the intraoperative real-time assistance system for spinal endoscopic operation.