Water is the basis for human survival and growth, and it holds great importance for ecological and environmental protection. The Hindu Kush Himalaya (HKH) is known as the “Water Tower of Asia”, where water influences changes in the global water cycle and ecosystem. It is thus very important to efficiently measure the status of water in this region and to monitor its changes; with the development of satellite-borne sensors, water surface extraction based on remote sensing images has become an important method through which to do so, and one of the most advanced and accurate methods for water surface extraction involves the use of deep learning networks. We designed a network based on the state-of-the-art Vision Transformer to automatically extract the water surface in the HKH region; however, in this region, terrain shadows are often misclassified as water surfaces during extraction due to their spectral similarity. Therefore, we adjusted the training dataset in different ways to improve the accuracy of water surface extraction and explored whether these methods help to reduce the interference of terrain shadows. Our experimental results show that, based on the designed network, adding terrain shadow samples can significantly enhance the accuracy of water surface extraction in high mountainous areas, such as the HKH region, while adding terrain data does not reduce the interference from terrain shadows. We obtained the water surface extraction results in the HKH region in 2021, with the network and training datasets containing both water surface and terrain shadows. By comparing these results with the data products of Global Surface Water, it was shown that our water surface extraction results are highly accurate and the extracted water surface boundaries are finer, which strongly confirmed the applicability and advantages of the proposed water surface extraction approach in a wide range of complex surface environments.