Thanks to the advent of telepresence applications, we can remotely take control and operate industrial machinery. Teleoperation removes operators from hazardous workplaces such as mining and plays an essential role in the safety of workers. In addition, augmented telepresence can introduce information that helps the user understand the remote scene. However, remote operation presents challenges since the information received is more limited than what could be perceived by being physically present, such as accurately judging depth. This study investigates how well operators interact with an augmented remote operation scaling system (AROSS) in a mining context when different computer-generated visual interfaces are provided. The system can achieve five visual interfaces: Disocclusion Augmentation view using selective content removal, Novel Perspective view generation, Lidar view, Right (Original) view, and Left (Original) view. We performed two experiments in a mine-like laboratory to analyze human interaction with the designed prototype by applying a mixed research methodology that used questionnaires, interviews, and observations. This mixed methodology consisted of quality of experience methods to discover the users’ requirements from a technological standpoint and user experience methods (i.e., user-centric approaches). We investigated 10 and 11 users’ interactions in two experimental studies. The first experiment focused on identifying small patterns (e.g., cracks in the mine wall), and the second focused on depth and three-dimensional understanding. We considered the first experiment a feasibility test to understand how to conduct the second experiment. Therefore, we designed the second test to assess the technical readiness of AROSS from the users’ perspective. The overall conclusion yields a comprehensive understanding of users’ perceptions and experiences. The quality of experience results favored Left and Right (Original) views for remote control, indicating a preference among remote operators using natural (Original) views due to their facilitation of environmental comprehension. User experience analysis revealed the reason why other views were less favored and what their potential benefits are. Specifically, Novel Perspective and Lidar views were found helpful for depth perception, and Disocclusion Augmentation view functionality could be enhanced if robot arm position tracking is enabled. These insights inform design recommendations, emphasizing the value of incorporating Disocclusion Augmentation and Novel Perspective views and suggesting improvements to enhance system usability.