Endoscopic sinus surgery (ESS) is widely used to treat chronic sinusitis. However, it involves the use of surgical instruments in a narrow surgical field in close proximity to vital organs, such as the brain and eyes. Thus, an advanced level of surgical skill is expected of surgeons performing this surgery. In a previous study, endoscopic images and surgical navigation information were used to develop an automatic situation recognition method in ESS. In this study, we aimed to develop a more accurate automatic surgical situation recognition method for ESS by improving the method proposed in our previous study and adding post-processing to remove incorrect recognition. We examined the training model parameters and the number of long short-term memory (LSTM) units, modified the input data augmentation method, and added post-processing. We also evaluated the modified method using clinical data. The proposed improvements improved the overall scene recognition accuracy compared with the previous study. However, phase recognition did not exhibit significant improvement. In addition, the application of the one-dimensional median filter significantly reduced short-time false recognition compared with the time series results. Furthermore, post-processing was required to set constraints on the transition of the scene to further improve recognition accuracy. We suggested that the scene recognition could be improved by considering the model parameter, adding the one-dimensional filter and post-processing. However, the scene recognition accuracy remained unsatisfactory. Thus, a more accurate scene recognition and appropriate post-processing method is required.